liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
268 stars 46 forks source link

Interpretation of CPK #47

Closed kc199 closed 3 years ago

kc199 commented 3 years ago

@mourisl, thank you so much again for this wonderful tool and for answering questions so promptly. I had a question about how to interpret the CPK metric from trust-stats.py. Essentially, I am wondering if higher CPK indicates higher overall diversity?

My initial thought was that higher CPK = higher overall diversity. This implies that CPK should be positively correlated with entropy (both calculated using trust-stats.py). And yet in our samples, we see that CPK is negatively correlated with entropy and also total read count. Should CPK be interpreted then as "clonality"? E.g. higher CPK = more clonal, and thus lower entropy and lower overall diversity?

For example, in this important paper from the Liu lab, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6880565/#MOESM2, "We observed significantly lower diversity in both pediatric and adult AML samples compared to non-tumor samples (Fig. 1b). This result suggests that T cells are more clonal in the AML microenvironment.".

However, in the TCGA study (Li et al Nature Genetics 2016) Fig 4b, CPK is positively correlated with mutation load. If CPK is similar to clonality, intuitively I would expect CPK to be higher with higher mutation load, since more mutations would lead to more clonal expansion and lower overall diversity?

Hope my question is clear and thank you very much for your help @mourisl!

mourisl commented 3 years ago

This is a very good question. Diversity measurement is very tricky. One can think of diversity from two perspectives: if we have more different types, then there is higher diversity; if we have non-uniform distribution, then we have less diversity. So richness and evenness will both affect the diversity measure, especially for CPK. In my experience, I also think CPK might be more related to clonality.

Your examples are not contradictory. In the AML case, we observed clonal expansion in tumor samples, hence lower CPK. In TRUST1, though mutation load might lead to clonal expansion, if the mutation load affects TCR richness more (more mutations could "attract" more TCRs), then we may observe a higher CPK in higher mutation load samples.

There are some good literature on the diversity topic, such as https://www.sciencedirect.com/science/article/pii/S0958166920301051?via%3Dihub

kc199 commented 3 years ago

Thank so much @mourisl for such a fast reply. Indeed, diversity estimation is tricky. I was wondering your thoughts on why CPK and read count might be negatively correlated in my samples? My guess is that there would be no relationship, it is curious that there is in fact a significant negative association.

A more general question is the motivation for CPK. The Nature Genetics paper nicely shows that richness is positively associated with total read count. However, isn't this relationship biologically important and expected (i.e. more total reads = more T cells = higher richness)? If this is the case, is it truly necessary to normalized for read count? Very sorry if my question is naive- thank you again!

mourisl commented 3 years ago

If there is a strong clonal expansion, then you will observe more reads from the same clonotype. Since all the extra reads of a sample are from one clonotype, you will have more probability mass on one clonotype, hence a smaller CPK.

Different samples have different sequencing depths and T/B cell infiltration levels, so normalization is still necessary.

kc199 commented 3 years ago

Thank you @mourisl for the interesting discussion- will have to think about this more, perhaps I will try several different diversity metrics in addition to CPK and entropy and see if my analyses are robust. Tentatively it does seem like CPK is more closely related to clonality than entropy. On our samples, we also have age available- since it is known that TCR repertoire diversity declines with age (https://pubmed.ncbi.nlm.nih.gov/24510963/), we tried correlating CPK with age.

Surprisingly we found that CPK is negatively correlated with age in our samples! This was unexpected, since if CPK roughly = clonality then I might've expected a positive association (higher age --> lower overall diversity --> higher clonality --> higher CPK).

But as you say, I guess it depends on the nature of the clonal expansion. Thanks again for your comments and suggestions!

mourisl commented 3 years ago

I just came back to your comment again, sorry it might be too late now. higher clonality->lower CPK so the negative correlation between CPK and age is expected.