Filtering of data and use of models

Hello, I have two questions to ask you. First, are the model training sets and test sets from VDJdb? What are the filtering conditions? Second, if I want to use your model directly to predict which epitopes the TCR will bind to, where do I start? Neither of the two examples you gave seems to have saved the trained model. What is the criterion for determining which epitope TCR will bind to, the output probability of the combination of the two or something else? @wonanut

Thank you for your interest in this project.

We conduct experiments on two datasets, one of which is from VDJdb and only antigen categories with enough number of TCR sequences are retained (so the remained epitopes are listed in our paper SETE: Sequence-based Ensemble learning approach for TCR Epitope binding prediction).

For the second question, since the total amount of dataset is small, but the feature dimension is large, we only performed cross-validation on these two datasets and took the average of the cross-validation as the final result. If you want to have a predictive model that can be used directly, you need to divide the training set, test set, and validation set (optional) based on your own dataset, and then train it yourself.

Hope my answer can help you.

wonanut / SETE

Filtering of data and use of models #2