Closed timothevanmeter closed 10 months ago
Hi,
Thanks for looking into this. I will be interested to see your results, especially if you find a problem with vsearch.
I think the option --eeout
to the command fastq_filter
or fastx_filter
will give you the information you need, so no need to modify the code. It will include ;ee=number
in the headers of the output file, where number
is the expected total error for the sequence.
Example:
$printf "@example\nACGT\n+\nIIII\n" | vsearch --fastx_filter - --eeout --quiet --fastaout -
Result:
>example;ee=0.0004000
ACGT
In this case there were four nucleotides of quality 40 (ascii I
), indicating an expected error of 0.0001 each. The sum is 0.0004.
I hope this will help you.
Hello,
Exactly what I was looking for! Thanks a lot! I thought I read this part of the documentation, but not thoroughly enough it seems...
If I find what is causing the difference between the two programs I will report it back here.
I compared the expected errors for both programs and even though the precision is different the number of discarded reads at the 0.01 threshold is exactly the same. So it must be a mix-up in the versions of the files I am using ; nothing to do with vsearch.
However, vsearch and some other utilities perform the same task in 35 seconds compared to 9 minutes in python! Thank you for your help!
Thanks, good to know!
Hello,
I am comparing two programs for the same task, namely filtering reads based on their expected error. However, with the same threshold, 0.01, the number of discarded reads is quite different: 2 millions more reads discarded with vsearch out of 36 millions total.
I would like to understand what is causing this difference and to do so obtain directly the expected error as calculated by vsearch - I already have this for the other program. Specifically, I would like to have the output for each read used to compare to the
fastq_maxee
argument.I found this in the code in filter.cc on line 165:
Would it be possible - without having to modify too much the C++ code - to output
res.ee
for each read from vsearch?Thanks in advance.
PS: I already made sure that the quality was converted from the right format using the option
--fastq_ascii 33
, so this is not causing the problem, maybe float precision difference between the two programs?