I recently came across your “DeepTCR” paper and I found the idea of combining supervised and unsupervised learning and applying them to modeling TCR repertoires intriguing. Also, the performance in your paper is very impressive! I have some questions regarding Figure 2.c (titled representative TCRs and learned TCR motifs) that I hope you can help with.
The length of learned TCR motifs in DeepTCR is 5 or 4, could you provide some justification for that?
Through the tutorial code, I noticed that there are around 30 different motifs are learned for each representative TCRs, however, it seems that the selected two motifs in Fig2.c are not the top two learned motif results. I wonder follow what principles did you select those motifs?
We chose 5 somewhat arbitrarily. In general, a kernel that is 5 amino acids long can learn motifs that are 1,2,3,4,5 amino acids in length. We hypothesized given the current literature on CDR3 binding residues, this length kernel would be sufficient to capture relevant binding motifs.
We chose ones to visualize that were the most understandable/interpretable to us as humans.
Hello Dr. Sidhom,
I recently came across your “DeepTCR” paper and I found the idea of combining supervised and unsupervised learning and applying them to modeling TCR repertoires intriguing. Also, the performance in your paper is very impressive! I have some questions regarding Figure 2.c (titled representative TCRs and learned TCR motifs) that I hope you can help with.
Thank you in advance for your help!