Closed frederic-mahe closed 1 year ago
In commit 001e25d, the precision has been increased to always show at least 4 significant digits, but adjusting the number of digits shown automatically:
$ printf "@s1\nA\n+\n~\n" | vsearch --fastq_filter - --quiet --eeout --fastqout - --fastq_qmax 93
@s1;ee=0.0000000005012
A
+
~
$ printf "@s1\nA\n+\n\"\n" | bin/vsearch --fastq_filter - --quiet --eeout --fastqout - --fastq_qmax 93
@s1;ee=0.7943
A
+
"
Also added for FASTA files as well, with commit f7d08bd9579d2a2666354bd5ab982d0a2d2c18f1.
Excellent! I've added tests (https://github.com/frederic-mahe/vsearch-tests/commit/7d4e97a13b0d0dd6070e5ae6fecebf4bfe31cee4 and https://github.com/frederic-mahe/vsearch-tests/commit/1a3c6951eca49afff723003216369f7b86660d28) to cover all the branches in 001e25d
Same thing for the --fastaout
case (f7d08bd).
when using the
--eeout
option, vsearch reports expected error values with a precision of 1E-4:https://github.com/torognes/vsearch/blob/3ab323616159ea39598d7ddd1dc9fa38864ee3a1/src/fastq.cc#L572
This is fine when working with quality values ranging from 0 to 40. With such a range, the smallest possible expected error is indeed 10E-(Q/10) = 1E-4.
When extending the possible range of quality values to 41 with Illumina 1.8+, or to 93 (!) with PacBio's HiFi reads, then the current precision is not sufficient anymore:
Maybe the precision should be increased to cover at least Q = 41 (ee = 0.000079433)? and possibly Q = 93 (ee ~ 0.0000000005)?