tianshilu / pMTnet

Deep Learning the T Cell Receptor Binding Specificity of Neoantigen
GNU General Public License v2.0
76 stars 20 forks source link

Retrain Model pMTnet #13

Open KiAkize opened 2 years ago

KiAkize commented 2 years ago

Hi,

I find the relevant training code is provided in this file test/code/ternary_train_model_pMTnet.py. However there are still some missing parts. Could you help with the following questions?

1) How to generate these files? Could you offer the relevant code?

tcr_file_train_pos='positive/TCR_output.csv'
tcr_file_train_neg='negative/TCR_output.csv'                                        
hla_antigen_file_train='MHC_antigen_output.csv'

2) What is the exact shape of negative data?

ternary_prediction.fit({'pos_in':tcr_train_pos,'neg_in':tcr_train_neg,'hla_antigen_in':hla_antigen_train}, {'output':Y_train},epochs=150,batch_size=256,shuffle=True)

The meaning of this line of code seems to be that the number of negative samples should be equal to the number of positive samples, not 10:1 as stated in the article.

3) Is (pos_in , neg_in) fixed? The shown code seems to indicate that each positive sample fixes a negative sample at training time. The network only identifies the TCR truly bound out of another one fixed TCR?

I am new about the field and keras. Thank you for your efforts.

tianshilu commented 2 years ago

@KiAkize ,

Thanks for your interest in our study. Please see the following for your questions.

  1. You can find positive and negative training through the links below. https://drive.google.com/file/d/1_pf6xIK2dRql_zZ5A1BzoGvWIYVvEaBp/view?usp=sharing https://drive.google.com/file/d/1KLlH-CBS4ep6UAEeh4Zv9Eghk1hZUWj7/view?usp=sharing

  2. Before the training process, we generated 10 times negative pairs and replicated each positive pair 10 times, and input to the network. So, the network was designed to take one positive pair and one negative pair each time.

  3. Please see the response to your second question. The negative pairs are generated by random combinations, so they are not fixed to each other.

Hope this helps! Thanks again for your interest!

Tianshi

KiAkize commented 2 years ago

Thank you for your reply.

I am not sure how to use the training file in the google drive to train a prediction network. Could you give a complete pipline to retrain the prediction network, given the TCR_encoder_30.h5, HLA_antigen_encoder_60.h5 and those data in google drive?

YYYYYeFei commented 1 year ago

@tianshilu Thank you for a great tool! I am still pretty new in this field.

Because I am new to studying this field, I do not understand many of the experimental procedures. I want to ask you how to validate the predicted binding between TCRs and pMHCs via the expected impact of the binding on the T cells and how to obtain Fig 2c. I saw Fig. 2c file in the Source Data Extended Data Fig. 2, but those data cannot be used to plot Fig. 2c. If convenient, can we communicate by email? My email is 201921001971@smail.xtu.edu.cn.

Thank you so much for all your efforts!