Closed andrewd789 closed 1 year ago
Are you sure about the number of entries in these files? In general, using grep -c @ filename.fastq
is not a reliable indicator of the number of entries in a FASTQ file because the @
character may also appear in the fourth line as a base quality indicator. I would recommend using wc -l filename.fastq
and divide by four, which is usually correct, but not technically in all cases.
If you still find that vsearch does not work correctly, I could investigate, but then it would be an advantage if I could have a look at the file, or a small example file that triggers the incorrect behaviour.
@andrewd789 could you please try with grep -c "^@" filename.fastq
to anchor the pattern and limit the number of false-positives?
@
is Q = 31 in the Phred+33 range of quality values.
also, vsearch --fastq_stats filename.fastq
can reliably count the number of fastq entries.
Humble apologies, I forgot that @ could also be a fastq quality value. Using grep -c @M
I can now see that file pairs that do merge have equal numbers of r1 and r2 reads, and those that do not merge have unequal numbers of r1 and r2 reads.
This does make me wonder, however, why differing numbers of sequence reads are necessarily a problem? I imagine one could match the sequence headers in an r1 file with those in an r2 file and thus identify the reads occurring in both that can be merged?
It could be done, but would make the merging slower and more complicated. Usearch has a command called fastx_syncpairs
than can be used to prepare read files for merging, in case there are different number of reads in the two files. It will sort reads and discard unpaired reads.
Thank you for the explanation, and for pointing out fastx_syncpairs
. Perhaps it would be good for this function to be added to VSEARCH someday?
Hello,
I have encountered a puzzling error while using
-fastq_mergepairs
invsearch v2.21.1_linux_x86_64
. It fails with the error message "Fatal error: More reverse reads than forward reads", in some cases where there are indeed more reverse reads than forward reads. But not in other cases where there are also more reverse than forward reads. For example, this produces the fatal error:So does this:
But this merges successfully, despite also having more reverse reads than forward reads:
Why does this error occur only in some cases with more reverse reads than forward reads?