shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
53 stars 12 forks source link

bigtable.tsv column question #95

Closed mihinduk closed 8 months ago

mihinduk commented 12 months ago

Hi Mike,

Luis ran hecatomb and we were surprised by the bigtable.tsv. We were expecting a CPM or SPM coulumn, but instead found a percent column. How is percent calculated? How is it related to CPM/SPM?

Looking at the CPM/SPM for the viral bigtable for freeze 3, the range is 0.328779842 to 355565.3566.

Looking at percent for the viral bigtable for freeze 3, the range is 3.227e-05 to 24.23.

I did find a CPM column in the taxonLevelCounts.tsv, but am unclear how to connect this to the bigtable.tsv

Thank you for your advice, Kathie

beardymcjohnface commented 12 months ago

CPM/SPM is the same thing. CPM in the taxonLevelCounts is actually now percent, I just hadn't updated the header.

The percent is calculated as: (count / libSize) * 100

Whereas CPM/SPM is: (count / libSize) * 1000000

It's just a different way of scaling the normalised counts. Just multiply by 10000 to convert back to SPM.