wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
297 stars 30 forks source link

CoverM and jgi_summarize_bam_contig_depths give different outputs #19

Closed apcamargo closed 4 years ago

apcamargo commented 4 years ago

Hi Ben

I just replaced jgi_summarize_bam_contig_depths with CoverM in my binning pipeline (which uses both MetaBat2 and MaxBin2) and I've noticed that my final bins was different from the ones I got before.

I prepared reduced BAMs, with just three contigs, and measured coverage with both jgi_summarize_bam_contig_depths and CoverM:

contigName method contigLen totalAvgDepth BE_BS_R1-BE_BS_R1.reduced.bam BE_BS_R1-BE_BS_R1.reduced.bam-var BE_BS_R1-BE_RX_R1.reduced.bam BE_BS_R1-BE_RX_R1.reduced.bam-var BE_BS_R1-BM_ER_R7.reduced.bam BE_BS_R1-BM_ER_R7.reduced.bam-var BE_BS_R1-BM_RX_R7.reduced.bam BE_BS_R1-BM_RX_R7.reduced.bam-var
BE_BS_R1_k147_2106787 CoverM 3152 5.5702863 5.578614 5.697416 14.260826 23.711023 0.92271817 2.084662 1.5189873 2.341692
BE_BS_R1_k147_2106787 jgi_summarize_bam_contig_depths 3152 22.2811 5.57861 5.69755 14.2608 23.7111 0.922718 2.08466 1.51899 2.34171
BE_BS_R1_k147_3889603 CoverM 2203 3.069654 10.337555 50.398186 0.71310276 1.1237903 0.49245006 0.889441 0.73550904 0.8008681
BE_BS_R1_k147_3889603 jgi_summarize_bam_contig_depths 2203 12.2786 10.3376 50.3978 0.713103 1.1238 0.49245 0.889457 0.735509 0.800879
BE_BS_R1_k147_2120400 CoverM 3251 4.63673 4.990003 5.4105444 8.594324 14.234088 0.65011287 1.901733 4.31248 36.406517
BE_BS_R1_k147_2120400 jgi_summarize_bam_contig_depths 3251 18.5469 4.99 5.41058 8.59432 14.2341 0.650113 1.90183 4.31248 36.405

reduced_bam.zip

As you can see, there are two differences:

  1. The totalAvgDepth differs a lot between the two tools. CoverM got the values right and I don't know how jgi_summarize_bam_contig_depths computed its results.
  2. CoverM outputs more decimal cases then jgi_summarize_bam_contig_depths. I think that's the reason the binning performed with CoverM gave me different results.

I'm wondering if CoverM should use the same number of decimal cases when using -m metabat output, just to keep consistency with jgi_summarize_bam_contig_depths.

Thank you for the great tool!

wwood commented 4 years ago

Hi,

Thanks for the kind words. jgi_summarize_bam_contig_depths has been updated from its original form e.g. https://bitbucket.org/berkeleylab/metabat/issues/48 - are you using a new version?

I think you are right about the decimal places - I've fixed this in ada7540 citing you.

apcamargo commented 4 years ago

I'm using the version in Bioconda. Probably it wasn't updated there.

Thank you for the prompt response!

wwood commented 4 years ago

HI,

OK, maybe that's it. If you have time I'd encourage you to update the bioconda repo and/or report your results here to the metabat peeps - You have a nice reduced BAM files which will make the bug easier to fix on their end, if it is still a problem. Thanks again for the report. Closing now - let me know if there's anything else. ben

On Sat, 8 Feb 2020 at 07:38, Antônio Pedro Camargo notifications@github.com wrote:

I'm using the version in Bioconda. Probably it wasn't updated there.

Thank you for the prompt response!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wwood/CoverM/issues/19?email_source=notifications&email_token=AAADX5DHC3BCUJZOWTWVVVDRBXIHBA5CNFSM4KRDONI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELEWZNY#issuecomment-583625911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADX5GFE4Z2CTAOATXMUXDRBXIHBANCNFSM4KRDONIQ .

--

Ben Woodcroft http://ecogenomic.org/users/ben-woodcroft http://www.ecogenomic.org/

Tarasovk49 commented 5 months ago

Hi! CheckM produces the mean of four values and jgi_summarize_bam_contig_depths produces the sum. You may easily check it: 22.2811 / 4 = 5.70275; 12.2786 / 4 = 3.06965; and so on.. The values you got with CheckM. Inconsistencies in last cases are due to the reason you mentioned: CheckM store values with more decimal places. Hope that helps!