tianshilu / pMTnet

Deep Learning the T Cell Receptor Binding Specificity of Neoantigen
GNU General Public License v2.0
76 stars 20 forks source link

Build my own background TCR #20

Open zunyun-Gong opened 7 months ago

zunyun-Gong commented 7 months ago

Thank you for your outstanding work. I prepare to use my own background in my project, and I read through your code and paper but I don not know how to do it. Can you get a reference pipeline for it? Thank you very much!

wtwt5237 commented 5 months ago

Hi @zunyun-Gong

We do not plan to develop a version that takes in customized background TCRs. But the section of code related to background is somewhere here:

TCR_neg_df_1k=pd.read_csv(library_dir+'/bg_tcr_library/TCR_output_1k.csv', names=pd.RangeIndex(0, 30,1), header=None, skiprows=1) 
TCR_neg_df_10k=pd.read_csv(library_dir+'/bg_tcr_library/TCR_output_10k.csv', names=pd.RangeIndex(0, 30,1), header=None, skiprows=1)
# As of the state of the software this step looks redundant and a waste of memory as it is loading an object that is already in memory but using a new variable name
# TCR_pos_df=pd.read_csv(output_dir+'/TCR_output.csv',index_col=0)  
# MHC_antigen_df=pd.read_csv(output_dir+'/MHC_antigen_output.csv',index_col=0)
################ make prediction ################# 
rank_output=[]
for each_data_index in range(TCR_encoded_matrix.shape[0]):
    tcr_pos=TCR_encoded_matrix.iloc[[each_data_index,]]
    pmhc=HLA_antigen_encoded_matrix.iloc[[each_data_index,]]
    #used the positive pair with 1k negative tcr to form a 1001 data frame for prediction                                                                      

    TCR_input_df=pd.concat([tcr_pos,TCR_neg_df_1k],axis=0)
    MHC_antigen_input_df= pd.DataFrame(np.repeat(pmhc.values,1001,axis=0))
    prediction=ternary_prediction.predict({'pos_in':TCR_input_df,'hla_antigen_in':MHC_antigen_input_df})

    rank=1-(sorted(prediction.tolist()).index(prediction.tolist()[0])+1)/1000
    #if rank is higher than top 2% use 10k background TCR                                                                                                         
    if rank<0.02:
        TCR_input_df=pd.concat([tcr_pos,TCR_neg_df_10k],axis=0)
        MHC_antigen_input_df= pd.DataFrame(np.repeat(pmhc.values,10001,axis=0))
        prediction=ternary_prediction.predict({'pos_in':TCR_input_df,'hla_antigen_in':MHC_antigen_input_df})

        rank=1-(sorted(prediction.tolist()).index(prediction.tolist()[0])+1)/10000
    rank_output.append(rank)

You might want to modify our codes to take in your customized TCRs

Thanks!

Tao

zunyun-Gong commented 3 months ago

Thank you for your help, it is quiet helpfull for my own work.