phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

Only CDR3 Region? #67

Open TheRaspberryFox opened 4 months ago

TheRaspberryFox commented 4 months ago

Hello,

I have very much enjoyed using your program. Currently, the TCR clusters are being formed exclusively around differences in TRAV gene usage. With no visible differences in CDR3 region (looks very random).

Is there a way where I can make the TCR clusters using data only from the CDR3 region? Also, can I choose to only focus on motif's for the beta chain and ignore the alpha chain? I am thinking that this could help remove some noise and focus on more subtle differences.

Thanks

phbradley commented 4 months ago

Hi there,

Thanks for the question. If you look at

https://github.com/phbradley/conga/blob/master/conga/tcrdist/tcr_distances.py#L236C1-L238C1

which defines the paired TCRdist distance, it says something like

        return ( self.rep_dists[tcr1[0][0]][tcr2[0][0]] + weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
                 self.rep_dists[tcr1[1][0]][tcr2[1][0]] + weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )

You could try replacing that with

        va_weight = 0
        cdr3a_weight = 0
        vb_weight = 0
        cdr3b_weight = 4 # or whatever

        return ( va_weight * self.rep_dists[tcr1[0][0]][tcr2[0][0]] + 
                 cdr3a_weight * weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
                 vb_weight * self.rep_dists[tcr1[1][0]][tcr2[1][0]] + 
                 cdr3b_weight * weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )

You will also need to disable the C++ tcrdist alternative, for example by moving the tcrdist_cpp/bin folder or by hardcoding this function to always return False:

https://github.com/phbradley/conga/blob/master/conga/util.py#L29

Let me know what you find! Take care, Phil