Query regarding TPM calculation of KO

jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis

GNU General Public License v3.0

384 stars 81 forks source link

Query regarding TPM calculation of KO #653

Closed drbmanna closed 1 year ago

drbmanna commented 1 year ago

Hi @fpusan

Based on the discussion in the thread #640, I have some conceptual queries.

As @mkariush pointed that "(by looking at the 12.funcover.pl script) that the total length is normalised to the copy number." followed by your response "we use the average length of a feature (e.g. a COG) in our dataset as the basis for RPKM/TPM normalization", I assume that the copy number calculation is dependent on the RecA abundance.

I wonder, if one skips the COG annotation (--nocog, thus no RecA) would that affect the TPM normalization of KO?

Thanks, BM.

fpusan commented 1 year ago

Yes, copy numbers will not be calculted automatically if you skip the COG annotation, but you can still calculate them manually, and we provide sets of single copy genes in SQMtools. See the SQMtools documentation for USiCGs and MGKOs

drbmanna commented 1 year ago

Thank you. I will look into it in detail.

However, my concern is with the KO TPM values we get when skipping COG annotations (--nocog). Are those values miscalculated as the TPM normalization done with copy-number in 12.funcover.pl script? Please correct me if I make a misinterpretation.

fpusan commented 1 year ago

No, they are fine. The average length of a feature is calculated independently for each feature, regardless on whether it is COG, KEGG, PFAM or from an external database. E.g. the average length of the ORFs annotated with K00001 in your dataset, the average length of COG0001, etc.

fpusan commented 1 year ago

Just to further clarify this This sentence

we use the average length of a feature (e.g. a COG) in our dataset as the basis for RPKM/TPM normalization

Does not mean that we use the average length of ALL of the COGs as the basis for normalization. Instead, we calculate the average length independently for each COG, KEGG, PFAM...

drbmanna commented 1 year ago

Thank you so much for the clarification.