Closed drbmanna closed 1 year ago
Yes, copy numbers will not be calculted automatically if you skip the COG annotation, but you can still calculate them manually, and we provide sets of single copy genes in SQMtools.
See the SQMtools documentation for USiCGs
and MGKOs
Thank you. I will look into it in detail.
However, my concern is with the KO TPM values we get when skipping COG annotations (--nocog). Are those values miscalculated as the TPM normalization done with copy-number in 12.funcover.pl script? Please correct me if I make a misinterpretation.
No, they are fine. The average length of a feature is calculated independently for each feature, regardless on whether it is COG, KEGG, PFAM or from an external database. E.g. the average length of the ORFs annotated with K00001 in your dataset, the average length of COG0001, etc.
Just to further clarify this This sentence
we use the average length of a feature (e.g. a COG) in our dataset as the basis for RPKM/TPM normalization
Does not mean that we use the average length of ALL of the COGs as the basis for normalization. Instead, we calculate the average length independently for each COG, KEGG, PFAM...
Thank you so much for the clarification.
Hi @fpusan
Based on the discussion in the thread #640, I have some conceptual queries.
As @mkariush pointed that "(by looking at the 12.funcover.pl script) that the total length is normalised to the copy number." followed by your response "we use the average length of a feature (e.g. a COG) in our dataset as the basis for RPKM/TPM normalization", I assume that the copy number calculation is dependent on the RecA abundance.
I wonder, if one skips the COG annotation (--nocog, thus no RecA) would that affect the TPM normalization of KO?
Thanks, BM.