wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

Question abount the genome module of coverm #215

Open quliping opened 3 weeks ago

quliping commented 3 weeks ago

Hello, coverm is a good software and very helpful for my work. However, I'm not sure about the calculation process of the 'genome' module of CoverM. It is very easy to understand the calculation process, e.g., TPM, in the contig module. For a single contig, we count the mapped count per base (total number of paired-end mapped reads divided by the contig length, abbreviated as C/B) for the contig, then the value was divided by the sum of the C/B for all contigs and multiply by 1e+6. However, how did coverM calculate the abundance for a genome with multiple contigs? There are three hypotheses: (1) the total TPM of all contigs of the genome, (2) the average TPM of genome contigs (total TPM divided by genome length or contig number), (3) the total mapped counts of the genome divided by genome length (C/G, mapped count per base for the genome) then divided by the sum of the C/G value of all genomes then multiply by 1e+6? Which one was selected by coverM, or coverM choose another different method?

wwood commented 3 weeks ago

Hi @quliping,

Thanks for the kind words.

If I understand you correctly, it is (3) which coverm uses.