Closed aoki0623mriid2 closed 2 years ago
Columns are generated in this part of the function bkgd_cntl_nn2 https://github.com/kmayerb/tcrdist3/blob/6bd7e58eed91b317245b4e909c8debc71db92fab/tcrdist/neighbors.py#L275-L293
These are internally computed variable for ranking metaclonotypes from those that are most likely to capture other target antigen-associated TCRs while spanning relatively few "background" TCRs..
"weighted"-refers to using the weighted adjustment to account for a non-uniform sampling of CDR3s from a particular set of V-J gene combinations.
TR, is target rate:
Its number of Target TCRs from the antigen enriched set within the radius over total within that set, with a psuedocount added to avoid zero. For example if you had 100 tetramer-positive TCRs and 8 of them fall within the radius than TR would be (8+1)/(8+92+1) ~ 0.08
BR_weighted, is background rate, number of backgrounds TCRs within the radius divided by total number of clones in the background, but this must be weighted:
centers_df['BR_weighted'] = [compute_rate(pos=r['bkgd_hits_weighted'],
neg=n2-r['bkgd_hits_weighted']) for i,r in centers_df.iterrows()]
RR: relative rate
centers_df['RR_weighted'] = centers_df['TR']/centers_df['BR_weighted']
OR: odds ratio
centers_df['OR_weighted'] =[compute_odds_ratio(pos=r['target_hits'],
neg=n1-r['target_hits'],
bpos=r['bkgd_hits_weighted'],
bneg= n2-r['bkgd_hits_weighted'], ps = 1) for i,r in centers_df.iterrows()]
'chi2dist' : is Chi-squared statistic with high values indicating high enrichment of target sequences falling within the radius relative to the number of background TCRs falling within the radius. If you choose to compute regex from each centroid and its neighbors you will also see a chi2re
and centers_df['chi2joint'] combines the chi2square based on distance and regex together
centers_df['chi2joint'] = [beta_re * r['chi2re'] + beta_dist* r['chi2dist'] for _,r in centers_df.iterrows() ]
Thank you for your kind replies. I understand the meanings of indices for ranking metaclonotype centers.
Thank you so much!
Dear kmayerb,
I have some questions about the tables of metaclonotype centers, which was generated as "centers_df" by the "bkgd_cnt1_nn2" function.
In that table, there are columns named "TR", "BR_weighted", "RR_weighted", "OR_weighted" and "chi2dist". What do the numbers in these columns mean? I assume that these mean the count of TCRs included in the metaclonotype in enriched or background repertoire, and odds ratio and chi2 value of cross tabulation. Are my interpretations correct?
Best regards, Hiroyasu