details on training high performance models?

overshiki commented 4 years ago

Hi, It's really nice work and the pretrained model is good to use! However, when I want to train a mhcflurry model from scratch using iedb dataset, it seems I could hardly get satisfactory performance. I tried the default model hyperparamters and the hyperparamters in the test examples, but both the resulting models do not predict as well as the pretrained model on ms dataset. Would you like give me any hint on the training trick or prefered hyperparamters? Or a training script would be of great help.

Thanks a lot

timodonnell commented 4 years ago

Training production-quality models can be pretty resource intensive (we use a cluster of several dozen nodes each with multiple GPUs), but the scripts used to do this are all available here:

https://github.com/openvax/mhcflurry/tree/master/downloads-generation

For example, for the newer pan-allele models, you could train some yourself using the GENERATE.sh script here:

https://github.com/openvax/mhcflurry/tree/master/downloads-generation/models_class1_pan_unselected

And then do model selection using the GENERATE.sh script here:

https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1_pan

In practice you'll probably need to modify a few things to work with your cluster environment, or train a much smaller number of models using a single node.

overshiki commented 4 years ago

Thanks a lot for the reply I'll give it a try

openvax / mhcflurry

details on training high performance models? #154