kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
53 stars 17 forks source link

What to do with multiple prodcutive alpha or beta chains #68

Closed emjbishop closed 2 years ago

emjbishop commented 2 years ago

Should we exclude multi-chain clonotypes or pick one of the multiples using something like UMI count?

We noticed that tcrdist2 "picks the productive variant of the chain associated with the highest number of unique molecular identifiers" apparently via preprocess_10X.get_10X_clones(). I couldn't find an equivalent function in tcrdist3 or documentation addressing this. Wondering what you would recommend.

kmayerb commented 2 years ago

Hi,

preprocess_10X.get_10X_clones() is function available via a related tool conga. (https://github.com/phbradley/conga)

Because this is such a common question with single cell VDJ data, we are also working on a solution for this. It's not part of tcrdist3 yet because it is still experimental, but you can check it out here: https://github.com/kmayerb/tenextra

this can take a 10X contig annotation file from cell ranger and attempt to choose the best A:B pairing based on UMI counts as well as removing what might be contamination from cell-free RNA/DNA from high abundance clones.

from tenextra.parse import select_likely_receptors
clean_clones, all_clones, ct_chains, ct_pairs = \
    select_likely_receptors(
        f = 'tenextra/data/filtered_contig_annotations_test.csv', 
        threshold_chains = 10)
emjbishop commented 2 years ago

Wonderful, thanks so much!