I have a question about the FastaRecordReader class data-algorithms-book/src/main/java/org/dataalgorithms/chap24/mapreduce/FastaRecordReader.java
I have been trying to use it for large genomes (fasta files much larger than a HDFS block, ie: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.fna.gz) but I am getting wrong sequences.
Is it possible that using this classes from Spark with newAPIHadoopFile method does not work for very large files? Or maybe am I missing something?
Hi,
I have a question about the FastaRecordReader class data-algorithms-book/src/main/java/org/dataalgorithms/chap24/mapreduce/FastaRecordReader.java
I have been trying to use it for large genomes (fasta files much larger than a HDFS block, ie: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.fna.gz) but I am getting wrong sequences.
Is it possible that using this classes from Spark with newAPIHadoopFile method does not work for very large files? Or maybe am I missing something?
Regards, and thank you very much for your time.
Jose M. Abuin