Closed Statistic-Qin closed 2 years ago
this first seq's ee is 1.2539
Hope your answer!
@Statistic-Qin the expected error (EE) is defined as the sum of error probability for all positions in a sequence: EE=∑ipi=∑i10–Qi/10. It is the sum of many [0, 1]-values, so it can be greater than one.
--fastq_maxee_rate 0.01
means that vsearch will discard sequences containing low-quality positions (an error rate of 0.01 or more means a quality of Q20 or less).
Please note that when you use the --fastq_maxee_rate
option it applies to the average expected error across the sequence, which will be a number between 0 and 1. When you use the --fastq_maxee
option, it applies to the total expected error for the sequence, which will vary between 0 and the length of the sequence. It is common to use --fastq_maxee 1.0
or a number of that magnitude. It allows for up to one expected wrong base per sequence. This is equivalent to using --fastq_maxee_rate 0.01
if the sequences are 100 bp long.
I think I understand the ee's meaning. The ee in picture is the sum of per sequence, which corresponds to --fastq_maxee. The --fastq_maxee_rate is ee/N, which is the mean of all sequences.
I've tried to improve the manpage entries for maxEE and maxEE_rate (see bae03fca37150b3fa4501446fdfe418f379b5143). Entries now read as such:
--fastq_maxee real
When using --fastq_filter, --fastq_mergepairs or
--fastx_filter, discard sequences with an expected er‐
ror greater than the specified number (value ranging
from 0.0 to infinity). For a given sequence, the ex‐
pected error is the sum of error probabilities for all
the positions in the sequence. In practice, the ex‐
pected error is greater than zero (error probabilities
can be small but not null), and at most equal to the
length of the sequence (when all positions have an er‐
ror probability of 1.0).
--fastq_maxee_rate real
When using --fastq_filter or --fastx_filter, discard
sequences with an average expected error greater than
the specified number (value ranging from 0.0 to 1.0 in‐
cluded). For a given sequence, the average expected er‐
ror is the sum of error probabilities for all the posi‐
tions in the sequence, divided by the length of the se‐
quence.
Hi, Thanks every authors first! in my project , i use the --fastx_filter command, and make the --fastq_maxee_rate as 0.01, and i find some seq's expected error are >1. How does it happend? every seq's error should be 0-1, Is the expected error also in 0-1 ?