wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

A question on RPKM calculation #157

Closed ilnamkang closed 1 year ago

ilnamkang commented 1 year ago

Hi,

I have a question on how RPKM values are calculated by CoverM.

According to the comment on a previous issue, "RPKM does take into consideration genome size and library size, as per its definition." (https://github.com/wwood/CoverM/issues/79#issuecomment-885905101)

But, it's difficult for me to find how library size is taken into consideration in the exemplar calculation available in the following link. (https://github.com/wwood/CoverM#calculation-methods)

----- imagine a set of 3 pairs of reads, where only 1 aligns to a single reference contig of length 1000bp:

rpkm = 2 * 10^9 / 1000 / 2 = 1000000 -----

I'm confused because "3 pairs of reads" were not used for calculation.

Does "library size" considered in RPKM mean the total number of recruited/mapped reads (excluding unmapped reads)? Or, does it mean the total number of reads used for recruitment/mapping (regardless of whether reads were mapped or not)?

Thanks.

Ilnam

wwood commented 1 year ago

Hi,

Does "library size" considered in RPKM mean the total number of recruited/mapped reads (excluding unmapped reads)?

Yes, that is my understanding of the RPKM, and what the "M" in that acronym stands for. So in the calculation it divides by 2, not 3.

Does that clear things up?

ilnamkang commented 1 year ago

Now it's all clear to me.

Thank you for clear explanation.