kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
53 stars 17 forks source link

Usage question #91

Closed ndrubins closed 1 year ago

ndrubins commented 1 year ago

Hi,

I'd like to measure the AA distance between all pairs of TCRs based on their alpha and beta CDR3 AA sequences, per each pair of TCRs with the same V(D)J sequence IDs (and I'd also like to do the same ignoring their V(D)J sequence IDs), using BLOSUM62. I created a csv file similar to the dash.csv example, here are the two first rows:

subject  cell_id                     cdr3_a_aa    v_a_gene j_a_gene v_b_gene d_b_gene j_b_gene count
2y_2 AAAGGTATCATTGTGG-1_2y_2 AAGGTGANKLI  TRAV22-1 TRAJ32   TRBV2    TRBD1    TRBJ1-1      2
2y_2 AAAGTGACAAATAGCA-1_2y_2 AGSSYNKLV    TRAV25-1 TRAJ50   TRBV2    TRBD1    TRBJ2-4      4

These TCR sequences are not human/mouse

Can I please ask for your help with the usage for that?

Also, if I had an equivalent file but for the gamma and delta chains, will it still be the same? Finally, what would be the usage for computing the distance but ignoring the V(D)J sequence IDs? (meaning between all pairs of CDR3 AAs) Thanks

kmayerb commented 1 year ago

You can do this either by using the package pwseqdist directly or by using TCRrep setting compute_distances = False, infer_cdrs = False. Then manually set the metrics, kwargs and weights for only the CDR3s, like below. Then run tr.compute_distances():

import pwseqdist as pw

import pandas as pd

from tcrdist.repertoire import TCRrep

tr = TCRrep(cell_df = df, organism = 'mouse',infer_cdrs= False, compute_distances = False, chains = ['alpha','beta'])

metrics_a = {

"cdr3_a_aa" : pw.metrics.nb_vector_tcrdist}

metrics_b = { "cdr3_b_aa" : pw.metrics.nb_vector_tcrdist } weights_a= { "cdr3_a_aa" : 3} weights_b = { "cdr3_b_aa" : 3} kargs_a = { 'cdr3_a_aa' : {'use_numba': True, 'distance_matrix': pw.matrices.tcr_nb_distance_matrix, 'dist_weight': 1, 'gap_penalty':4, 'ntrim':3, 'ctrim':2, 'fixed_gappos': False}, }kargs_b= { 'cdr3_b_aa' : {'use_numba': True, 'distance_matrix': pw.matrices.tcr_nb_distance_matrix, 'dist_weight': 1, 'gap_penalty':4, 'ntrim':3, 'ctrim':2, 'fixed_gappos': False} } tr.metrics_a = metrics_atr.metrics_b = metrics_b tr.weights_a = weights_atr.weights_b = weights_b tr.kargs_a = kargs_a tr.kargs_b = kargs_b

tr.compute_distances()

On Thu, Aug 17, 2023 at 6:23 PM ndrubins @.***> wrote:

Hi,

I'd like to measure the AA distance between all pairs of TCRs based on their alpha and beta CDR3 AA sequences, per each pair of TCRs with the same V(D)J sequence IDs (and I'd also like to do the same ignoring their V(D)J sequence IDs), using BLOSUM62. I created a csv file similar to the dash.csv example, here are the two first rows:

subject cell_id cdr3_a_aa v_a_gene j_a_gene v_b_gene d_b_gene j_b_gene count 2y_2 AAAGGTATCATTGTGG-1_2y_2 AAGGTGANKLI TRAV22-1 TRAJ32 TRBV2 TRBD1 TRBJ1-1 2 2y_2 AAAGTGACAAATAGCA-1_2y_2 AGSSYNKLV TRAV25-1 TRAJ50 TRBV2 TRBD1 TRBJ2-4 4

These TCR sequences are not human/mouse

Can I please ask for your help with the usage for that?

Also, if I had an equivalent file but for the gamma and delta chains, will it still be the same? Finally, what would be the usage for computing the distance but ignoring the V(D)J sequence IDs? (meaning between all pairs of CDR3 AAs) Thanks

— Reply to this email directly, view it on GitHub https://github.com/kmayerb/tcrdist3/issues/91, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALD2PVZMNSTRMLZOKRFBBTLXV27YZANCNFSM6AAAAAA3U25U7M . You are receiving this because you are subscribed to this thread.Message ID: @.***>