Closed mpalmada closed 1 year ago
Hello Mark,
High QV is not necessarily correlated with high completeness.
QV is looking at the k-mers present only in the **assembly**
+ k-mers present in both assembly and reads
,
while completeness looks at k-mers present only in **reads**
+ k-mers present in both assembly and reads
.
In most cases, ignoring k-mers with frequency=1 is applicable. That is, filtering the read set with
meryl greater-than 1 reads.meryl reads_filt.meryl
and re-running Merqury with reads_filt.meryl
.
If the coverage is high enough, the k-mer spectrum (spectra-cn) usually shows a good distinction between the low-coverage errorneous region vs. 1-copy region. You may increase the cutoff for filtering out low-frequency kmers, however this is not something generalizable, as sequencing depth and error profile varies among different sequencing runs. Also keep in mind that the chance of missing a true k-mer is increasing by increasing the cut-off.
Best, Arang
Closing this for now. Feel free to re-open if you need more help!
Hi Arang,
I have a lot of Illumina data that lead to high amounts of read_only k-mers, which lowers my completeness even if I have a high QV. Is there a way to standarize the completeness by the sequencing depth?
Thanks a lot!
Marc