sidhomj / DeepTCR

Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data
https://sidhomj.github.io/DeepTCR/
MIT License
110 stars 40 forks source link

Understanding Training Strategy of Supervised TCR repertoire classification on HIV dataset #80

Open Albert-Shuai opened 1 year ago

Albert-Shuai commented 1 year ago

Hi, Sorry to disturb:

I am trying to understand the training strategy of HIV dataset and replicate the results you get in your publication.

It seems that the dataset can be categorized as non-cognate groups (CEF, AY9, No Peptide conditions), or cognate groups (where there is an epitope). We have 3 3 samples that are non-cognate, while 25 3 samples as cognate groups. I saw from the paper that deeptcr can distinguish non-cognate samples from cognate samples, and the training used keep two out of three for training data.

My question is, when doing the training, did you

  1. fit the model using all (3+25) 2 data at once, where 3 2 are non-cognate and 25*2 are cognate group? Then you test the model on the remaining 3+25 samples and see whether the model can correctly predict whether each sample is cognate or non-cognate.
  2. Or you use (3+1) 2 data, where the 3 2 data are non-cognate while the 1 2 data is from one specific epitope instead using all 25 2 samples as cognate group data? Then you test the model on the remaining 3+1 samples to see whether it can corrected predict which (one) sample is the cognate group. Then you repeat 2 for each specific epitope (MSPRTLNAW, NTQGYFPDW, etc...)

Thanks and looking forward to your reply!