At least several factors affect observed categories:
allelic exclusion (alluded to in your "T-cell receptor model", but note the intrinsic bias of beta vs alpha chain rearrangement; gamma-delta adds further complexity but we'll ignore this for now since many scRNAseq assays do not target these chains)
contributing to but distinct from the signal dropout, differential expression levels of alpha vs beta chains
single-cell signal dropout (despite additional targeted amplification of the TCR locus)
Adding to the sources you've already cited, this illustration from Dupic et al 2019 emphasizes the different a priori expectation of the beta vs alpha chain:
Separate from the allelic exclusion bias, this figure from Redmond et al 2016 illustrates the 2 fold expression bias between alpha and beta:
These biological mechanisms contribute the the observed frequencies. But what is measured is additionally confounded by signal dropout and by doublets.
Single-cell considerations:
if you use the additional information within single-cell data (UMI-based assembly of chain contig sequence, choice of V, etc), could some categories be further resolved than just using the CDR3 sequence?
would the productive status of a particular chain influence the probability of observing another? (1 unproductive beta chain detected -> greater chance of expecting another beta chain). Even if a chain is unproductive, it could still help track/merge clones.
for sufficiently expanded clones (e.g. a tumor/infection/t-cell leukemia), option to probabilistic-ly infer most likely chain configuration (e.g. 1 beta, 2 alpha) based on repeated measurements across multiple cells of the same clone (particularly if using the whole chain's contig to increase confidence), thus overcoming measurement dropout. This could lead to better calls of trickier doublets (e.g. if the expanded clone is confidently assessed as 1 beta, 1 alpha, but a few cells have an additional alpha or beta, then those may be probabilistic doublets hidden within the "extra chain" category). Lmk me if you'd like a realworld dataset to play with.
@jfx319 wrote:
At least several factors affect observed categories:
Adding to the sources you've already cited, this illustration from Dupic et al 2019 emphasizes the different a priori expectation of the beta vs alpha chain:
Separate from the allelic exclusion bias, this figure from Redmond et al 2016 illustrates the 2 fold expression bias between alpha and beta:
These biological mechanisms contribute the the observed frequencies. But what is measured is additionally confounded by signal dropout and by doublets.
Single-cell considerations: