Closed lisiruisusan closed 1 year ago
You can not really do it easily. The thing about RPKM or TPM is that you need an estimation of feature length. This is easy to do for genes/functions (e.g. how long is this gene, or whats the everage/median length of all the genes belonging to a certain function). But it is not so straightforward for taxa. E.g. you could download all the genomes from a given phylum, calculate their average or median length, and use that for normalization during RPKM/TPM calculation. But how meaningful is that, really? It is a bit easy for bins, since at least they are concrete features with a defined length. SQMtools now tracks the coverage per million reads for the different bins. This gives you a kinda similar information to RPKM. Otherwise you can just use the percentage of reads mapping to the different phyla or species.
Thanks so much for the reply.
Dear developers,
We can calculate the length and reads for certain species and phylum in step 11 and we can calculate the RPKM value for annotation genes in step 12. So how can I calculate the RPKM value for species and phylum?
Yours sincerely,
Lisirui
September 9th 2023