mhcflurry 1.2 - Githubissues

Major changes

Support for model selection. The standard models are selected using 10% affinity data held out from training + mass-spec. New command: mhcflurry-class1-select-allele-specific-models
Support for multiple training passes. For alleles with little data (default <1000 measurements) pre-train the network on data from similar alleles (in terms of sequence) before training on the real data.

Minor changes

default ensemble centrality measure is bumped back to “mean”, but “robust mean” may still be used as an option
include mass-spec data from IEDB (in addition to systemhcatlas + Abelin Immunity)
Arbitrary dataframes of metadata may be attached to Class1AffinityPredictor instances using the “metadata_dataframes” instance variable. Exact training data used is now included in the models directory for the standard predictors.

New downloads

models_class1_unselected . Full ensembles pre-model selection.
models_class1_minimal. Ensemble size=1 predictors for rapid testing
models_class1_no_mass_spec . Model selection w/o mass spec. Enables accuracy evaluation using mass-spec data.

Refactoring

Parallelism-related code is now in the library (parallelism.py) instead of in the train command
Percentile rank calibration is now in its own command (mhcflurry-calibrate-percentile-ranks) instead of being bundled with the training script

Optimizations

Explicit, configurable handling of multiple-GPUs for training and model selection. Workers are assigned either GPUs or left to run on a CPU, enabling simultaneous use of all available GPUs and CPUs.
Cache repeated calls to Class1NeuralNetwork.predict that use the same EncodableSequences object to specify peptides. Also a variety of optimizations to Class1AffinityPredictor.predict_to_dataframe. These were required for model selection to have an acceptable runtime.
Smarter caching of compiled neural networks. Reuse compiled networks even when they differ in certain aspects (e.g. number of training epochs) that we know do not affect the prediction code.

Removed

Cross validation (no longer required since held-out training data enables a direct estimate of generalization error using the pre-model selection full ensemble)
models_class1_experiments1

Closes #95. Closes #56. Closes #34.

openvax / mhcflurry