Support for model selection. The standard models are selected using 10% affinity data held out from training + mass-spec. New command: mhcflurry-class1-select-allele-specific-models
Support for multiple training passes. For alleles with little data (default <1000 measurements) pre-train the network on data from similar alleles (in terms of sequence) before training on the real data.
Minor changes
default ensemble centrality measure is bumped back to “mean”, but “robust mean” may still be used as an option
include mass-spec data from IEDB (in addition to systemhcatlas + Abelin Immunity)
Arbitrary dataframes of metadata may be attached to Class1AffinityPredictor instances using the “metadata_dataframes” instance variable. Exact training data used is now included in the models directory for the standard predictors.
New downloads
models_class1_unselected . Full ensembles pre-model selection.
models_class1_minimal. Ensemble size=1 predictors for rapid testing
models_class1_no_mass_spec . Model selection w/o mass spec. Enables accuracy evaluation using mass-spec data.
Refactoring
Parallelism-related code is now in the library (parallelism.py) instead of in the train command
Percentile rank calibration is now in its own command (mhcflurry-calibrate-percentile-ranks) instead of being bundled with the training script
Optimizations
Explicit, configurable handling of multiple-GPUs for training and model selection. Workers are assigned either GPUs or left to run on a CPU, enabling simultaneous use of all available GPUs and CPUs.
Cache repeated calls to Class1NeuralNetwork.predict that use the same EncodableSequences object to specify peptides. Also a variety of optimizations to Class1AffinityPredictor.predict_to_dataframe. These were required for model selection to have an acceptable runtime.
Smarter caching of compiled neural networks. Reuse compiled networks even when they differ in certain aspects (e.g. number of training epochs) that we know do not affect the prediction code.
Removed
Cross validation (no longer required since held-out training data enables a direct estimate of generalization error using the pre-model selection full ensemble)
Coverage decreased (-29.4%) to 43.548% when pulling 1dc0f8cf7222e9c6d7084ffa5ce4d7909c8f68aa on v1.1.1 into 4c0b193f98464e3ebd52bd7c21922df3c839f5ec on master.
Major changes
mhcflurry-class1-select-allele-specific-models
Minor changes
New downloads
Refactoring
mhcflurry-calibrate-percentile-ranks
) instead of being bundled with the training scriptOptimizations
Class1NeuralNetwork.predict
that use the sameEncodableSequences
object to specify peptides. Also a variety of optimizations toClass1AffinityPredictor.predict_to_dataframe
. These were required for model selection to have an acceptable runtime.Removed
Closes #95. Closes #56. Closes #34.