wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

The relative abundance of each MAG in each sample #144

Open B-1991-ing opened 1 year ago

B-1991-ing commented 1 year ago

Hi Ben,

I am calculating the relative abundance of X MAGs in Y samples. The relative abundance of 39 pyhla.pdf

I would like to know that if the relative abundance of each MAG in each sample is calculated based on the all reads in all samples OR all reads assigned to all MAGs? A: the relative abundance of each MAG in each sample B: reads mapped on each MGA C: all reads in all samples OR all reads assigned to all MAGs A = B / C

Based on the relative abundance of each MAG in each sample, I also want to know the relative abundance of each phylum in each sample. But, I am not sure if it is okay to directly sum up the relative abundance of each MAG belonging to one phylum in each sample? D: the relative abundance of each phylum in each sample D = sum of the relative abundance of each MAG that belongs to each phylum

Best,

Bing

B-1991-ing commented 1 year ago

Hi Ben,

Update

According to the introduction of Calculation methods, Why the mean coverage of each genome --- 0.02235294 needs to be divided by the total mean coverage of all genomes --- 0.02235294?

Finally, the relative abundance of all genomes in each sample are calculated by A: the relative abundance of all genomes in each sample B: the mean coverage of each genome in each sample C: the mean coverage of all genomes in each sample D: the reads number of each genome in each sample E: all reads number of all genomes in each sample A = (B/C) * (D/E)

BUT, how to get the mean coverage of all genomes in each sample? Directly get the mean value of the mean coverage of all genomes in each sample by firstly summing up all mean coverage of all genomes and dividing the sum number by the genome number?

Just to be sure, for the mean coverage of each genome in each sample, just summing up all aligned reads number and divide the sum by the genome length?

Best,

Bing

B-1991-ing commented 1 year ago

Hi Ben,

Update A: The mean coverage of each genome in each sample A: (10+9)/(1000-2*75) ≈ 0.02

B: A total of mean coverage across all genomes in each sample (assume 3 genomes in this sample) B: (10+9+10+9+10+9)/(850+850+850) ≈ 0.02

C: The relative abundance of each genome in each sample (assume 2 genomes in this sample) C: (0.02/0.02) * (2/6) ≈ 0.33 (0.02/0.02) is used for correction.

So, actually the relative abundance of each genome in each sample is equal to the ratio of reads mapped on each genome divides all reads per sample basically.

Best,

Bing

wwood commented 1 year ago

Hi,

Sorry I'm a bit lost. I think not everything you are saying there is quite correct, but what is your question/issue?

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: B-1991-ing @.> Sent: Friday, December 16, 2022 7:50:27 AM To: wwood/CoverM @.> Cc: Subscribed @.***> Subject: Re: [wwood/CoverM] The relative abundance of each MAG in each sample (Issue #144)

Hi Ben,

Update A: The mean coverage of each genome in each sample A: (10+9)/(1000-2*75) ≈ 0.02

B: A total of mean coverage across all genomes in each sample (assume 3 genomes in this sample) B: (10+9+10+9+10+9)/(850+850+850) ≈ 0.02

C: The relative abundance of each genome in each sample (assume 2 genomes in this sample) C: (0.02/0.02) * (2/6) ≈ 0.33 (0.02/0.02) is used for correction.

So, actually the relative abundance of each genome in each sample is equal to the ratio of reads mapped on each genome divides all reads per sample basically.

Best,

Bing

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/144#issuecomment-1353751815, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5GVJQRFTCKFXOTJJALWNOHCHANCNFSM6AAAAAAS6ZSA5E. You are receiving this because you are subscribed to this thread.Message ID: @.***>

B-1991-ing commented 1 year ago

Thank you for your reply.

I mainly have two questions.

  1. How to calculate the total mean coverage of all genomes in each sample?
  2. How to calculate the relative abundance of each genome in each sample?
wwood commented 1 year ago

The total mean coverage is just the sum of each genomes' mean coverage.

The realative abundance is first calculated by working out the fraction of reads that map, and then partitioning the percent that does according to the ratios from the mean coverage of each genome.

hth, ben

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: B-1991-ing @.> Sent: Friday, 16 December 2022, 8:02 pm To: wwood/CoverM @.> Cc: Ben J Woodcroft @.>; Comment @.> Subject: Re: [wwood/CoverM] The relative abundance of each MAG in each sample (Issue #144)

Thank you for your reply.

I mainly have two questions.

  1. How to calculate the total mean coverage of all genomes in each sample?
  2. How to calculate the relative abundance of each genome in each sample?

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/144#issuecomment-1354493965, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5CQT43Y7R3SPIJG4N3WNQ43ZANCNFSM6AAAAAAS6ZSA5E. You are receiving this because you commented.Message ID: @.***>

B-1991-ing commented 1 year ago

Do you consider the genome length when calculating the relative abundance of each genome in each sample?

wwood commented 1 year ago

No, because there's no need I don't think. Mean cov is the average number of reads covering a base in the genome, so in a sense that is already taken care of (unlike counting total reads that map).

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: B-1991-ing @.> Sent: Saturday, December 17, 2022 8:22:50 PM To: wwood/CoverM @.> Cc: Ben J Woodcroft @.>; Comment @.> Subject: Re: [wwood/CoverM] The relative abundance of each MAG in each sample (Issue #144)

Do you consider the genome length when calculating the relative abundance of each genome in each sample?

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/144#issuecomment-1356155420, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5FOZD2HM2T66EP7I5DWNWH7VANCNFSM6AAAAAAS6ZSA5E. You are receiving this because you commented.Message ID: @.***>

B-1991-ing commented 1 year ago

So, assume length of genomeA is 10K, length of genomeB is 1K. Due to the genome sequencing is randomly and equally among all reads, so the mapped reads of genomeA and genomeB COULD be 10bp and 1bp. Then, the mean coverage of genomeA and genomeB are 10/10000=0.001 and 1/1000=0.001, kind of same finally?

wwood commented 1 year ago

Right, though of course 1bp and 10bp are too short to actually map. But that aside, I agree.

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: B-1991-ing @.> Sent: Saturday, December 17, 2022 8:58:00 PM To: wwood/CoverM @.> Cc: Ben J Woodcroft @.>; Comment @.> Subject: Re: [wwood/CoverM] The relative abundance of each MAG in each sample (Issue #144)

So, assume length of genomeA is 10K, length of genomeB is 1K. Due to the genome sequencing is randomly and equally among all reads, so the mapped reads of genomeA and genomeB COULD be 10bp and 1bp. Then, the mean coverage of genomeA and genomeB are 10/10000=0.001 and 1/1000=0.001, kind of same finally?

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/144#issuecomment-1356181195, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5FTIELCWJW3NH4UYFDWNWMDRANCNFSM6AAAAAAS6ZSA5E. You are receiving this because you commented.Message ID: @.***>

B-1991-ing commented 1 year ago

Ok, I get it now. Thank you very much. Nice weekend.