wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
311 stars 31 forks source link

Question on min-covered-fraction and total read counts #146

Open mherold1 opened 1 year ago

mherold1 commented 1 year ago

Hello,

thanks for providing the tool.

When testing I looked a bit at samples with low mapping% and noticed that setting --min-covered-fraction 0 changed % reads mapped from 17% to 45% when using coverm genome (default parameters otherwise). For metatranscriptomic reads I assume that setting covered fraction to 0 would make sense. Are there any other parameters should be adjusted in this case?

For the TPM / RPKM calculations, these are always based on the total mapped reads of only the included genomes, correct?

I also noticed that the reported number of total reads in coverM slightly varies from the number reported with samtools flagstat, in my example 21535073 to 21686837. Why could this be?

wwood commented 1 year ago

Hi,

When testing I looked a bit at samples with low mapping% and noticed that setting --min-covered-fraction 0 changed % reads mapped from 17% to 45% when using coverm genome (default parameters otherwise). For metatranscriptomic reads I assume that setting covered fraction to 0 would make sense. Are there any other parameters should be adjusted in this case?

I think you have it right, though of course be aware that CoverM only reports (at least for now), per-genome read mapping metrics, not per-gene, which is usually what you are after. The --min-covered-fraction 10 default is to weed out spurious mappings to genomes that aren't present in the metagenome at all - I'm not clear what the right answer for metaT is, but maybe 0% and interpret with caution?

For the TPM / RPKM calculations, these are always based on the total mapped reads of only the included genomes, correct?

Yes, that's correct.

I also noticed that the reported number of total reads in coverM slightly varies from the number reported with samtools flagstat, in my example 21535073 to 21686837. Why could this be?

Hmm, not sure actually. Multimapping? How many reads are in the original set?