wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
309 stars 31 forks source link

TAD (truncated average depth) and Trimmed mean depth #34

Closed jianshu93 closed 4 years ago

jianshu93 commented 4 years ago

Dear Ben, I am now comparing different methods for calculating genome coverage.

Our group proposed a method to calculate the abundance of each MAG binned from assembled contigs called TAD80:

The abundance of each genome species (representative of a cluster after dereplication) was estimated using the MAG of highest genome quality as representative. For each metagenomic dataset, the sequencing depth was estimated per position (Bowtie (Langmead and Salzberg, 2012, for mapping shot reads to MAGs), bedtools (Quinlan and Hall, 2010, for calculation coverage using mapped bam file)) and truncated to the central 80% (BedGraph.tad.rb (Rodriguez-R and Konstantinidis, 2016)), a metric hereafter termed TAD (truncated average sequencing depth). Abundance was then estimated as TAD80 normalized by the genome equivalents of the metagenomic dataset. Three steps need to calculated TAD80:

  1. Map reads to MAG using mapping tools (bwa or bowtie2) and get the sorted bam file
  2. Calculate coverage for each position: bedtools genomecov -ibam MAG_sorted.bam >> MAG.bedtools.cov.txt
  3. Calculate TAD80 using the script BedGraph.tad.rb (https://github.com/lmrodriguezr/enveomics/blob/master/Scripts/BedGraph.tad.rb): BedGraph.tad.rb -i lab5_MAG.001.bedtools.cov.txt -r 0.8

For the CoverM genome trimmed mean method, if I understand it correctly:

You did similar thing (choosing --trim-min 0.1 and —trim-max 0.9) compared to TAD80, but you also remove the first and last 75 bp to avoid bad mapping (edge effects):

coverm genome -d ./try_MAGs_1 -x fasta -b ./mapping_bam/MAG.001.bam --min-covered-fraction 0.001 -m trimmed_mean --trim-min 0.1 --trim-max 0.9 --contig-end-exclusion 75

The directory try_MAGs_1 contains only MAG.001.fasta, MAG.001.bam is generated by mapping reads to MAG.001.fasta

My question is: is TAD80 basically the same thing as CoverM trimmed mean (--trim-min 0.1 and —trim-max 0.9) (my understanding is: it is)? They might not be exactly the same but the general idea of workflow and logic for calculating average coverage for a MAG is the same right (except the removed 75 bp at both ends of contig)?

I know you are on vacation. Please feel free to answer the question whenever you have time.

Thank you very much,

Best,

Jianshu

wwood commented 4 years ago

Addressed in f0840a53fc89dfb4a427e7f2eda33936ac465712 - thanks. Let me know if there are further disparities.