kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
55 stars 17 forks source link

A to P change in CDR3 but no effect on distance #103

Open andreas-wilm opened 5 months ago

andreas-wilm commented 5 months ago

Hi @kmayerb,

I came across a case, where two TCRs differ by an A to P change in their CDR3, but their distance is 0. How come?

>>> import pandas as pd
>>> from tcrdist.repertoire import TCRrep
>>> df = pd.DataFrame([
    ['TRAV13-1*01', 'TRAJ15*01', 'CAPTNQAGTALIF', 1],
    ['TRAV13-1*01', 'TRAJ15*01', 'CAATNQAGTALIF', 1]],
    columns=['v_a_gene', 'j_a_gene', 'cdr3_a_aa', 'count'])
>>> tr = TCRrep(cell_df = df, 
    organism = 'human', 
    chains = ['alpha'], 
    db_file = 'alphabeta_gammadelta_db.tsv')
>>> tr.pw_alpha, tr.pw_cdr3_a_aa
(array([[0., 0.],
        [0., 0.]]),
 array([[0, 0],
        [0, 0]], dtype=int16))

Many thanks, Andreas

PS: Tested with version 0.2.2

kmayerb commented 2 months ago

Tcrdist by default trims first 3 amino acid and last 2 amino acids so those do not incur a penalty. Trim settings can be modified.

kargs_b= {  
    'cdr3_b_aa' : 
        {'use_numba': True, 
        'distance_matrix': pw.matrices.tcr_nb_distance_matrix, 
        'dist_weight': 1, 
        'gap_penalty':4, 
        'ntrim':3, 
        'ctrim':2, 
        'fixed_gappos': False},
kmayerb commented 2 months ago

https://tcrdist3.readthedocs.io/en/latest/tcrdistances.html#i-want-complete-control