wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
275 stars 30 forks source link

Use contig length for calculations, or contig length minus 150? #101

Open jeffkimbrel opened 2 years ago

jeffkimbrel commented 2 years ago

Hi Ben, I just wanted to get your thoughts on whether you think the other three metrics that use contig length should be doing what the mean calculation is. Specifically, subtracting what is in the --contig-end-exclusion argument (2x). The rationale is since you aren't mapping to the contig ends, they shouldn't be in the calculations, and some methods like covered_fraction could never reach 100% because you will always come up 150nt short compared to the full length.

This would affect the covered_fraction, reads_per_baseand rpkm methods (with tpm also being affected via rpkm). By subtracting out the --contig-end-exclusion lengths from those calculations, it would bump up the alignment counts with a bigger increase for smaller contigs. Based on some tests with covered_fraction, this does subtly change the rankings of contigs.

Let me know what you think.