kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
52 stars 16 forks source link

Identifying which clones cluster where on a tree #86

Closed TheRaspberryFox closed 1 year ago

TheRaspberryFox commented 1 year ago

Hello,

TCRdist is a great package. I appreciate its development!

I am attempting to perform an analysis on my dataset; however, I am not sure how to best achieve it.

I currently have two conditions, with both conditions having TCRs that target the same epitope. I want to see if there are differences in the distribution of TCR motifs between these conditions. My hypothesis being that in one condition the clonal repertoire is more narrow is selecting for a certain type of TCR.

Is there a way I can get a list of total motifs (not sure how to define this), then look at each motif's percentage of clones from each condition? Or is there a better way to perform an analysis like this?

If you need further clarification, I am happy to provide more information.

Thank you very much.

kmayerb commented 1 year ago

One approach would be to compute similarity between all clones and then cluster clones into connected components as the graph of all clones with TCRdist < 18 if single chain beta or TCRdist < 100 if paired chain alpha/beta. You could then test each connected component for enrichment for conditions A or B using a Fisher's Exact Test? Each connected graph component would represent a hypothesized motif? Can you provide more information about the size of the dataset and type of data (single or paired chain TCRs)?

TheRaspberryFox commented 1 year ago

Currently, I am using a gap statistic and silhouette plot to determine the number of optimal clusters. Then I use a chi-square test to look at the distribution of TCRs in each defined cluster. With each cluster representing a motif. It seems to work okay, but I filtering out background cells by getting rid of high distance scores may be beneficial.

This is single cell TCR-sequencing. I have a several thousand clones.