wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

[Feature Request] Add a method to calculate genome coverage (e.g., 1X, 10X, 100X) #124

Open jolespin opened 1 year ago

jolespin commented 1 year ago

I'm submitting a bunch of genomes to NCBI and they want genome coverage:

The estimated base coverage across the genome, eg 12x. This can be calculated by dividing the number of bases sequenced by the expected genome size and multiplying that by the percentage of bases that were placed in the final assembly. More simply it is the number of bases sequenced divided by the expected genome size.

I was looking for software that could do this before writing my own and thought this could be a great addition to your program.

wwood commented 1 year ago

Thanks for the interest. Those are 2 definitions which don't rely on mapping though. Can't you just use mean or trimmed mean coverage as already implemented? Seems like mapping or the first definition there would suit mags, which is I suppose what you have.

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: Josh L. Espinoza @.> Sent: Friday, July 15, 2022 5:05:48 AM To: wwood/CoverM @.> Cc: Subscribed @.***> Subject: [wwood/CoverM] [Feature Request] Add a method to calculate genome coverage (e.g., 1X, 10X, 100X) (Issue #124)

I'm submitting a bunch of genomes to NCBI and they want genome coverage:

The estimated base coverage across the genome, eg 12x. This can be calculated by dividing the number of bases sequenced by the expected genome size and multiplying that by the percentage of bases that were placed in the final assembly. More simply it is the number of bases sequenced divided by the expected genome size.

I was looking for software that could do this before writing my own and thought this could be a great addition to your program.

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/124, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5BKKDEQ5U7DH3XEVVDVUBQIZANCNFSM53TH6HGA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jolespin commented 1 year ago

I might be misinterpreting but I was under the assumption that I need to parse the bam file, get the length of the aligned read, creating a running sum for all the contigs, then divide by the genome size.

Is that incorrect? I might be overly complicating it.

On Jul 14, 2022, at 12:54 PM, Ben J Woodcroft @.***> wrote:

 Thanks for the interest. Those are 2 definitions which don't rely on mapping though. Can't you just use mean or trimmed mean coverage as already implemented? Seems like mapping or the first definition there would suit mags, which is I suppose what you have.

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: Josh L. Espinoza @.> Sent: Friday, July 15, 2022 5:05:48 AM To: wwood/CoverM @.> Cc: Subscribed @.***> Subject: [wwood/CoverM] [Feature Request] Add a method to calculate genome coverage (e.g., 1X, 10X, 100X) (Issue #124)

I'm submitting a bunch of genomes to NCBI and they want genome coverage:

The estimated base coverage across the genome, eg 12x. This can be calculated by dividing the number of bases sequenced by the expected genome size and multiplying that by the percentage of bases that were placed in the final assembly. More simply it is the number of bases sequenced divided by the expected genome size.

I was looking for software that could do this before writing my own and thought this could be a great addition to your program.

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/124, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5BKKDEQ5U7DH3XEVVDVUBQIZANCNFSM53TH6HGA. You are receiving this because you are subscribed to this thread.Message ID: @.***> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

wwood commented 1 year ago

I'm reasonably sure that is the same thing as mean coverage, if you are only counting the bits of the reads which map. The genome size isn't actually needed as it cancels out - for instance the coverage of the first half of the genome should be the same as the second half. Quite a convenient fact.

On Jul 15 2022, at 6:29 am, Josh L. Espinoza @.***> wrote:

I might be misinterpreting but I was under the assumption that I need to parse the bam file, get the length of the aligned read, creating a running sum for all the contigs, then divide by the genome size. Is that incorrect? I might be overly complicating it.

On Jul 14, 2022, at 12:54 PM, Ben J Woodcroft @.***> wrote:

 Thanks for the interest. Those are 2 definitions which don't rely on mapping though. Can't you just use mean or trimmed mean coverage as already implemented? Seems like mapping or the first definition there would suit mags, which is I suppose what you have.

-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT


From: Josh L. Espinoza @.> Sent: Friday, July 15, 2022 5:05:48 AM To: wwood/CoverM @.> Cc: Subscribed @.***> Subject: [wwood/CoverM] [Feature Request] Add a method to calculate genome coverage (e.g., 1X, 10X, 100X) (Issue #124)

I'm submitting a bunch of genomes to NCBI and they want genome coverage:

The estimated base coverage across the genome, eg 12x. This can be calculated by dividing the number of bases sequenced by the expected genome size and multiplying that by the percentage of bases that were placed in the final assembly. More simply it is the number of bases sequenced divided by the expected genome size.

I was looking for software that could do this before writing my own and thought this could be a great addition to your program.

― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/124, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5BKKDEQ5U7DH3XEVVDVUBQIZANCNFSM53TH6HGA. You are receiving this because you are subscribed to this thread.Message ID: @.***> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. — Reply to this email directly, view it on GitHub (https://github.com/wwood/CoverM/issues/124#issuecomment-1184868113), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAADX5AK5GMYJSWHBKSLSATVUB2CPANCNFSM53TH6HGA). You are receiving this because you commented.