Dear Ben,
I am now comparing different methods for calculating genome coverage.
Our group proposed a method to calculate the abundance of each MAG binned from assembled contigs called TAD80:
The abundance of each genome species (representative of a cluster after dereplication) was estimated using the MAG of highest genome quality as representative. For each metagenomic dataset, the sequencing depth was estimated per position (Bowtie (Langmead and Salzberg, 2012, for mapping shot reads to MAGs), bedtools (Quinlan and Hall, 2010, for calculation coverage using mapped bam file)) and truncated to the central 80% (BedGraph.tad.rb (Rodriguez-R and Konstantinidis, 2016)), a metric hereafter termed TAD (truncated average sequencing depth). Abundance was then estimated as TAD80 normalized by the genome equivalents of the metagenomic dataset. Three steps need to calculated TAD80:
Map reads to MAG using mapping tools (bwa or bowtie2) and get the sorted bam file
Calculate coverage for each position:
bedtools genomecov -ibam MAG_sorted.bam >> MAG.bedtools.cov.txt
For the CoverM genome trimmed mean method, if I understand it correctly:
You did similar thing (choosing --trim-min 0.1 and —trim-max 0.9) compared to TAD80, but you also remove the first and last 75 bp to avoid bad mapping (edge effects):
The directory try_MAGs_1 contains only MAG.001.fasta, MAG.001.bam is generated by mapping reads to MAG.001.fasta
My question is: is TAD80 basically the same thing as CoverM trimmed mean (--trim-min 0.1 and —trim-max 0.9) (my understanding is: it is)? They might not be exactly the same but the general idea of workflow and logic for calculating average coverage for a MAG is the same right (except the removed 75 bp at both ends of contig)?
I know you are on vacation. Please feel free to answer the question whenever you have time.
Dear Ben, I am now comparing different methods for calculating genome coverage.
Our group proposed a method to calculate the abundance of each MAG binned from assembled contigs called TAD80:
The abundance of each genome species (representative of a cluster after dereplication) was estimated using the MAG of highest genome quality as representative. For each metagenomic dataset, the sequencing depth was estimated per position (Bowtie (Langmead and Salzberg, 2012, for mapping shot reads to MAGs), bedtools (Quinlan and Hall, 2010, for calculation coverage using mapped bam file)) and truncated to the central 80% (BedGraph.tad.rb (Rodriguez-R and Konstantinidis, 2016)), a metric hereafter termed TAD (truncated average sequencing depth). Abundance was then estimated as TAD80 normalized by the genome equivalents of the metagenomic dataset. Three steps need to calculated TAD80:
For the CoverM genome trimmed mean method, if I understand it correctly:
You did similar thing (choosing --trim-min 0.1 and —trim-max 0.9) compared to TAD80, but you also remove the first and last 75 bp to avoid bad mapping (edge effects):
coverm genome -d ./try_MAGs_1 -x fasta -b ./mapping_bam/MAG.001.bam --min-covered-fraction 0.001 -m trimmed_mean --trim-min 0.1 --trim-max 0.9 --contig-end-exclusion 75
The directory try_MAGs_1 contains only MAG.001.fasta, MAG.001.bam is generated by mapping reads to MAG.001.fasta
My question is: is TAD80 basically the same thing as CoverM trimmed mean (--trim-min 0.1 and —trim-max 0.9) (my understanding is: it is)? They might not be exactly the same but the general idea of workflow and logic for calculating average coverage for a MAG is the same right (except the removed 75 bp at both ends of contig)?
I know you are on vacation. Please feel free to answer the question whenever you have time.
Thank you very much,
Best,
Jianshu