smarco / gem3-mapper

GEM-Mapper v3
GNU General Public License v3.0
56 stars 17 forks source link

Parsing FASTA/FASTQ error. #29

Open Luobiny opened 2 years ago

Luobiny commented 2 years ago

Hi, GEM developers, I am running into a parsing FASTA/FASTQ error using gem-mapper with a pair of FASTQ files. It does not seem to be the issue of the FASTQ files themselves because I am able to align them using BWA without any error. Also when I split the FASTQ files into smaller chunks (like 32GB or less for each chunk), GEM aligns each chunk without any error. The error messages are like the following:

   2021/11/24 10:45:00 -- # 146400000 sequences processed
    GEM::FatalError (input_fasta.c:299,input_fasta_parser_prompt_error)
     Parsing FASTA/FASTQ error(R1.fq:3473248862119260163). Beginning Symbol ('>' or '@') not found. Bad syntax
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    GEM::Unexpected error occurred. Sorry for the inconvenience
         Feedback and bug reporting it's highly appreciated,
         => Please report or email (gem.mapper.dev@gmail.com)
    GEM::Running-Thread (threadID = 14)
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    GEM::Version v3.6.0-bundle-release
    GEM::CMD gem-mapper -I human_g1k_v37_decoy_phiXAdaptr.gem -1 R1.fq -2 R2.fq -t 18 -r @RG        ID:sample       PL:ILLUMINA     LB:sample       SM:sample       CN:sample PU:sample

Another interesting thing is that when I launch a different run using the same FASTQ files, the number of sequences processed in the log messages right before the error is bit different; even thought they seem to be between 144,000,000 to 150,000,000.

smarco commented 2 years ago

There seems to be a bug on the parsing module with very large FASTQ files. We will investigate it. Thanks for the report.

Luobiny commented 2 years ago

I've used gem-mapper to align many other very large FASTQ files from human whole genome sequencing but they didn't cause any error. I can share these two FASTQ files if that will help you reproducing the problem and debugging the code.

smarco commented 2 years ago

Well, sure. If you can make those files available to us, it would be of great help. Thanks,