sidhomj / DeepTCR_Cancer

Deep learning reveals predictive sequence concepts within immune repertoires to immunotherapy
15 stars 1 forks source link

question about repertoire classifier #2

Closed Elisa89m closed 1 year ago

Elisa89m commented 1 year ago

Hi, thank for develop this tool. I'm trying to understand the way how you performed the repertoire classifier for CheckMate-038 dataset.

In particular, I'm wondering: 1) how many sequences DeepTCR needs for apply Monte_Carlo_CrossVal function? There is a minimum value of reasonable sequences and samples to apply the training?

2) the input data used in the tutorial (Data/bulk_tcr/pre) includes both responder (crpr) samples and not responder (pdsd) samples? if yes how the user can provide to DTCR object the clinical response of each sample? In other words, the model that you trained is applied only on "crpr samples" or also "sdpd" samples?

3) It would be possible to have just the final pre-trained DeepTCR model of CheckMate-038 in order to test it on other datasets? If required I can formally asked this point by email.

Thank you in advance for your time and availability.

Elisa

sidhomj commented 1 year ago

In response to your questions:

1) There really is no "minimum number of sequences." The CheckMate-038 had hundreds-thousands of sequences per patient. In general, I would recommend when fitting the repertoire classifier in Monte-Carlo Crossval, having anywhere between 2-6 samples in the left-out group to ensure accurate estimation of the generalizability of the model.

2) There are multiple ways to upload sample level data with labels. Please see the DeepTCR tutorials for how to do so.

3) Will work to see if we can make the model available.

JW