Closed overshiki closed 4 years ago
Training production-quality models can be pretty resource intensive (we use a cluster of several dozen nodes each with multiple GPUs), but the scripts used to do this are all available here:
https://github.com/openvax/mhcflurry/tree/master/downloads-generation
For example, for the newer pan-allele models, you could train some yourself using the GENERATE.sh script here:
https://github.com/openvax/mhcflurry/tree/master/downloads-generation/models_class1_pan_unselected
And then do model selection using the GENERATE.sh script here:
https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1_pan
In practice you'll probably need to modify a few things to work with your cluster environment, or train a much smaller number of models using a single node.
Thanks a lot for the reply I'll give it a try
Hi, It's really nice work and the pretrained model is good to use! However, when I want to train a mhcflurry model from scratch using iedb dataset, it seems I could hardly get satisfactory performance. I tried the default model hyperparamters and the hyperparamters in the test examples, but both the resulting models do not predict as well as the pretrained model on ms dataset. Would you like give me any hint on the training trick or prefered hyperparamters? Or a training script would be of great help.
Thanks a lot