wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

Question about TPM #160

Open TJrogers86 opened 1 year ago

TJrogers86 commented 1 year ago

Hello developers, Let me start by saying thanks for this awesome tool. Its been a lifesaver for me. But I do have a long winded question about how it calculates abundances, especially when it comes to TPM using BAM files. For my data, I used bowtie2 to create the BAMs. My understanding is bowtie2 strictly maps reads to the contigs and does not take into account the contig length. So what you get is a number of mapped reads  and not the actual read coverage. This is were my question comes in with coverM and its TPM calculation: Hypothetically, lets say I have a MAG that is composed of three contigs of different lengths: 1kb with 3k aligned reads 3kb with 8k aligned reads 5kb with 7k aligned reads   First part of the question: Does coverM first calculate the coverage of reads in each contig? So something like: 3k aligned/1kb = 3X 8k aligned/3kb = 2.66 X 7k aligned/6kb = 1.16 X   If so, how is TPM of that MAG calculated compared to other MAGs in the sample? If we were just considering these contigs against each other as if they weren’t members of the same MAG, my understanding of calculating TPM is: 1.     Find read coverage as above (Reads per kilobase: RPK). 2.     Sum all RPKs in a sample together and divide by 106 to create a scaling factor: a.     3x + 2.66x + 1.16x = 6.82/106 = 6.82x10-6 3.     Now divide each RPK by the scaling factor to calculate TPM: a.     3/6.82x10-6 = 439.8k TPM b.     2.66/6.82x10-6 = 390.0k TPM c.     1.16/6.82x10-6 = 170.1k TPM This all makes sense on a Contig level, however not in a MAG level as a MAG is made of multiple contigs. So, is coverM calculating the TPM of each MAG by the mean TPM of each contig within? (I don’t think this is it) Or is read coverage being calculated at the MAG level and not at the individual contig level? If the second that seems problematic to me. Or is it something else altogether? Sorry for the long question, I just want to make sure I can explain to my reader and PI how TPM is calculated on a MAG level

JotaKas commented 1 year ago

I also did not fully understand. Did you manage to get an answer?

TJrogers86 commented 1 year ago

I never got an answer. Would love to know though.