mazzalab / fastqwiper

An ensemble method to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and settle reads interleaving.
GNU General Public License v3.0
25 stars 3 forks source link

BBmap repair.sh guesses the wrong ASCII quality encoding offset and fails #16

Closed cdupai-bbi closed 6 months ago

cdupai-bbi commented 6 months ago

Thanks for putting together a great tool to solve an annoying problem!

I'm trying to repair some corrupted fastqs using your singularity pipeline and the repair.sh script guesses the correct quality score encoding (ASCII-33) but then part way through decides to switch to ASCII-64 which causes an error (see below). If I run the _fixed_wiped_paired.fastq.gz files through repair.sh manually with the --qin flag set the reads interleave and sort correctly. Adding an optional qin parameter to the top-level singularity command and using that in the repair.sh call would fix this for me.

Warning! Changed from ASCII-33 to ASCII-64 on input Z: 90 -> 59
Up to 17635226 prior reads may have been generated with incorrect qualities.
If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.

The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5.

Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'.

Offset=64
java.lang.ArrayIndexOutOfBoundsException: Index -6 out of bounds for length 128
        at stream.Read.validateCommonCase_branchless(Read.java:361)
        at stream.Read.validate(Read.java:115)
        at stream.Read.<init>(Read.java:77)
        at stream.Read.<init>(Read.java:50)
        at stream.FASTQ.quadToRead_slow(FASTQ.java:851)
        at stream.FASTQ.toReadList(FASTQ.java:686)
        at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
        at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:93)
        at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:682)
        at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:658)
java.lang.Exception: Aborting.
        at shared.KillSwitch.kill(KillSwitch.java:108)
        at stream.FASTQ.quadToRead_slow(FASTQ.java:794)
        at stream.FASTQ.toReadList(FASTQ.java:686)
        at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
        at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:93)
        at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:682)
        at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:658)

Set cris1Active=false
mazzalab commented 6 months ago

Hi, we will work on it and add the parameter in the next release