lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

meaning of warning / use of -L in kmer mode #47

Closed handibles closed 2 years ago

handibles commented 3 years ago

Hey & thanks Devs for the packages.

I have a persistent warning that my sequences are approaching saturation:

WARNING: The curve reached near-saturation, hence coverage estimations could be unreliable 
To avoid saturation increase the -L parameter, currently set at ...lots%

Is the overlap param, -L, used in kmer comparisons also (i.e. in -T kmer)?

And if so, is the unreliability only to be found at the extremes (near the asymptotic point of saturation), or could the values be particularly unreliable across all values generated (and especially the kmer diversity, see #38 )? I've increased my L to 90% but still receive the same warning (though these are particularly thorough sampling efforts (15GB unzipped fastq, human gut faecal microbiome) so I'm not surprised).

lmrodriguezr commented 2 years ago

Hello,

I'm glad you found Nonpareil useful!

This only affects the final coverage estimation, but diversity (and all other estimates) would be very reliable with near-saturation. In any case, our more recent experiments show that the effect on coverage estimates is very small anyway, so I've decided to remove this warning (for the upcoming v3.4). For now, it is safe to simply ignore this warning.

Best wishes Miguel.