sidhomj / DeepTCR

Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data
https://sidhomj.github.io/DeepTCR/
MIT License
110 stars 40 forks source link

Inquiry about Fig2.c Motifs Visualization in DeepTCR #50

Closed pzhang84 closed 2 years ago

pzhang84 commented 2 years ago

Hello Dr. Sidhom,

I recently came across your “DeepTCR” paper and I found the idea of combining supervised and unsupervised learning and applying them to modeling TCR repertoires intriguing. Also, the performance in your paper is very impressive! I have some questions regarding Figure 2.c (titled representative TCRs and learned TCR motifs) that I hope you can help with.

  1. The length of learned TCR motifs in DeepTCR is 5 or 4, could you provide some justification for that?
  2. Through the tutorial code, I noticed that there are around 30 different motifs are learned for each representative TCRs, however, it seems that the selected two motifs in Fig2.c are not the top two learned motif results. I wonder follow what principles did you select those motifs?

Thank you in advance for your help!

sidhomj commented 2 years ago
  1. We chose 5 somewhat arbitrarily. In general, a kernel that is 5 amino acids long can learn motifs that are 1,2,3,4,5 amino acids in length. We hypothesized given the current literature on CDR3 binding residues, this length kernel would be sufficient to capture relevant binding motifs.
  2. We chose ones to visualize that were the most understandable/interpretable to us as humans.