openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
190 stars 57 forks source link

MHC Class II support? #212

Closed PhilPalmer closed 1 year ago

PhilPalmer commented 1 year ago

Hi,

Thanks for developing a great tool!

I was wondering:

  1. What is the motivation for focusing only on predicting binding affinity (and presentation) to MHC Class I and not Class II?
    • Is it just that predicting binding to MHC Class I is easier and it would take time to add Class II support?
  2. What would adding support for Class II involve?
    • Presumably, models would need to be trained using MHC Class II data. Would it be possible to use the same dataset as NetMHCPan?
    • Would the maximum input peptide length need to be increased, e.g. to 17-mers?
    • Do you think any other changes to the existing model architecture(s) would be required?
  3. Do you have any plans to add support for Class II?
  4. Also, would you say that the presentation score is still experimental? Would you recommend using the binding affinity scores only?

Many thanks in advance, Phil

timodonnell commented 1 year ago

Hi Phil - We don't have plans for adding class II support anytime soon unfortunately. We focused on class I here just because that was our original use case (we were mostly motivated by work on a cancer vaccine focused on generating CD8 responses). The main difference for class II is that the register of the peptide in the mhc ii binding groove needs to be inferred at prediction time, which means you would need a different architecture or another way of scanning over possible binding cores (for example something like convolutional with max pooling). I would recommend using NetMHCIIpan for class II. But yes, if you made the needed architectural changes, increased peptide length, and substituted in class II training data it should be possible to repurpose mhcflurry for a class ii predictor.

Regarding (4) I think the conservative thing is still to use binding affinity scores rather than presentation scores to predict T cell epitopes. In a small benchmark I ran a few years ago (which I put in Chapter 4 in my dissertation if you'd like the details), it looked like PS was a bit better than BA for predicting cancer neoantigens but not for viral epitopes. So I'd say the question of which is better isn't settled.