FastaRecordReader for huge fasta files

Hi,

I have a question about the FastaRecordReader class data-algorithms-book/src/main/java/org/dataalgorithms/chap24/mapreduce/FastaRecordReader.java

I have been trying to use it for large genomes (fasta files much larger than a HDFS block, ie: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.fna.gz) but I am getting wrong sequences.

Is it possible that using this classes from Spark with newAPIHadoopFile method does not work for very large files? Or maybe am I missing something?

Regards, and thank you very much for your time.

Jose M. Abuin

mahmoudparsian / data-algorithms-book

FastaRecordReader for huge fasta files #31