wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
297 stars 30 forks source link

Meaning of the totalAvgDepth column in the metabat method #75

Open CK7 opened 3 years ago

CK7 commented 3 years ago

Hi Ben, Thanks for writing CoverM, I find it very useful. So far I have been using jgi_summarize_bam_contig_depths but I suspect it contains some bugs, including inconsistencies in coverage calculation for the same bam file when it is provided with different combinations of other bam files (or maybe I am missing something about how it should work... CoverM's behavior is consistent!). It seems as if the column totalAvgDepth is interpreted differently in the two programs: in CoverM it is the average of the coverage calculated across all bam files but in jgi_summarize_bam_contig_depths it is the sum of the coverages. I was wondering if this is intentional, and if so then why. I would like to use CoverM's output as input for metabat2, and I would assume it expects jgi_summarize_bam_contig_depths's interpretation. Or maybe I am wrong? Thanks! Itai

wwood commented 3 years ago

Hi Itai,

Thanks for the kind words and interest in CoverM. I imagine you are correct in theory, that metabat2 expects its file format, and indeed CoverM is supposed to reproduce the metabat2 input file. However, as my eagle-eyed graduate student @rhysnewell pointed out to me, metabat2 actually ignores that column anyway - see https://bitbucket.org/berkeleylab/metabat/src/8b5702be9852d0ee0c1bd3ba960a21e691c2ae78/src/metabat2.cpp#lines-346

I'm leaving this open as it does appear to be a real bug, and thank you for reporting it, but maybe not the highest priority to fix.. ben