Closed ndrubins closed 1 year ago
You can do this either by using the package pwseqdist directly or by using TCRrep setting compute_distances = False, infer_cdrs = False. Then manually set the metrics, kwargs and weights for only the CDR3s, like below. Then run tr.compute_distances():
import pwseqdist as pw
import pandas as pd
from tcrdist.repertoire import TCRrep
tr = TCRrep(cell_df = df, organism = 'mouse',infer_cdrs= False, compute_distances = False, chains = ['alpha','beta'])
metrics_a = {
"cdr3_a_aa" : pw.metrics.nb_vector_tcrdist}
metrics_b = { "cdr3_b_aa" : pw.metrics.nb_vector_tcrdist } weights_a= { "cdr3_a_aa" : 3} weights_b = { "cdr3_b_aa" : 3} kargs_a = { 'cdr3_a_aa' : {'use_numba': True, 'distance_matrix': pw.matrices.tcr_nb_distance_matrix, 'dist_weight': 1, 'gap_penalty':4, 'ntrim':3, 'ctrim':2, 'fixed_gappos': False}, }kargs_b= { 'cdr3_b_aa' : {'use_numba': True, 'distance_matrix': pw.matrices.tcr_nb_distance_matrix, 'dist_weight': 1, 'gap_penalty':4, 'ntrim':3, 'ctrim':2, 'fixed_gappos': False} } tr.metrics_a = metrics_atr.metrics_b = metrics_b tr.weights_a = weights_atr.weights_b = weights_b tr.kargs_a = kargs_a tr.kargs_b = kargs_b
tr.compute_distances()
On Thu, Aug 17, 2023 at 6:23 PM ndrubins @.***> wrote:
Hi,
I'd like to measure the AA distance between all pairs of TCRs based on their alpha and beta CDR3 AA sequences, per each pair of TCRs with the same V(D)J sequence IDs (and I'd also like to do the same ignoring their V(D)J sequence IDs), using BLOSUM62. I created a csv file similar to the dash.csv example, here are the two first rows:
subject cell_id cdr3_a_aa v_a_gene j_a_gene v_b_gene d_b_gene j_b_gene count 2y_2 AAAGGTATCATTGTGG-1_2y_2 AAGGTGANKLI TRAV22-1 TRAJ32 TRBV2 TRBD1 TRBJ1-1 2 2y_2 AAAGTGACAAATAGCA-1_2y_2 AGSSYNKLV TRAV25-1 TRAJ50 TRBV2 TRBD1 TRBJ2-4 4
These TCR sequences are not human/mouse
Can I please ask for your help with the usage for that?
Also, if I had an equivalent file but for the gamma and delta chains, will it still be the same? Finally, what would be the usage for computing the distance but ignoring the V(D)J sequence IDs? (meaning between all pairs of CDR3 AAs) Thanks
— Reply to this email directly, view it on GitHub https://github.com/kmayerb/tcrdist3/issues/91, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALD2PVZMNSTRMLZOKRFBBTLXV27YZANCNFSM6AAAAAA3U25U7M . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
I'd like to measure the AA distance between all pairs of TCRs based on their alpha and beta CDR3 AA sequences, per each pair of TCRs with the same V(D)J sequence IDs (and I'd also like to do the same ignoring their V(D)J sequence IDs), using BLOSUM62. I created a csv file similar to the
dash.csv
example, here are the two first rows:These TCR sequences are not human/mouse
Can I please ask for your help with the usage for that?
Also, if I had an equivalent file but for the gamma and delta chains, will it still be the same? Finally, what would be the usage for computing the distance but ignoring the V(D)J sequence IDs? (meaning between all pairs of CDR3 AAs) Thanks