Open madnessfish opened 2 years ago
Hi,
Thanks for your interest. You can find positive training and negative training through the links below. https://drive.google.com/file/d/1_pf6xIK2dRql_zZ5A1BzoGvWIYVvEaBp/view?usp=sharing https://drive.google.com/file/d/1KLlH-CBS4ep6UAEeh4Zv9Eghk1hZUWj7/view?usp=sharing
Thanks!
Hi @tianshilu I have a similar request. I'd like to make some comparisons on the method you proposed. Could you provide the testing data used in pMTnet with both pos/neg labels? Thank you!
Hi @Miles-DDD,
Please find the testing data with labels through the links below: https://drive.google.com/file/d/1iddT16YEbEh5LYULokEMoey53RPiVsXt/view?usp=sharing https://github.com/tianshilu/pMTnet/blob/master/test/input/test_input.csv
Thanks!
Hi @tianshilu Thank you for providing all the information!
I am curious about how the negative sets are generated (like any script?), as I have found 1912 entries are overlapping in the positive and negative training sets as the following command. Not sure if I have made any mistakes here.
comm -12 <(sort -u neg_training.csv ) <(sort -u pos_training.csv ) | wc -l
Also, I would like to know how these labeled training/ test data contribute to the training_data.csv and testing_data.csv under the pMTNet/data repository.
Hi @madnessfish,
Thanks for your interest in our study! For each pair of TCR-pMHC, 10 negative pairs are generated by sampling 10 TCRs from the other TCRs randomly. So, there is a very small proportion overlapping between positive and negative by chance. We didn't remove the overlapped pairs from the negative dataset because they help reduce overfitting.
Negative datasets are generated from the training_data.csv and testing_data.csv as I described above. Hope this helps!
Tianshi
Thank you for a great tool! I am still pretty new in this field.
I would like to learn more about the training process on pMTnet. I am not sure if I missed the training data in the repository. Could you please provide the training data used in pMTnet with positive and negative labels (e.g. positive/TCR_output.csv, negative/TCR_output.csv, training_positive.csv)? Thank you so much for all your efforts!