Closed amomin-pact closed 4 years ago
We don't have it for MHCflurry 1.6 unfortunately. It was added as part of the 2.0 release. The models do get downloaded when the image is created and are part of the image. Let me know if you have any issues using it.
Hello Tim, Looking at the release notes, it doesnt seem you have done major changes from v1.6, except updating the model training criteria and porting the code 2.0 (that's a big one). If one downloads the models for v1.6 does it still get the original v1.6 models. I assume the models files are saved by individual versions.
Thanks Amin
That's right. If you pip install mhcflurry 1.6 you'll get the models for 1.6.
It should also be possible to use the models from 1.6 with the mhcflurry 2.0 codebase by downloading the models separately (see the URLs in downloads.yml) and then passing --models
when you call predict, see here for an example. But I think that may only work for affinity prediction and not processing prediction due to model serialization changes in tensorflow 2.
Hello Tim, Thanks for the update. I also see that you have a notebook section with the new release. I would appreciate if you can have a notebook depicting the steps in the model training. It would be great to see how the data is prepared and model training scripts are executed.
Thanks Amin
Good idea - can look into adding an example of model training as a notebook.
In terms of training the production models that are available for download, you need a cluster with GPUs do this in a reasonable amount of time, but the scripts used are here:
Affinity predictor: https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1_pan/GENERATE.sh
AP predictor: https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1_processing/GENERATE.sh
PS predictor: https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1_presentation/GENERATE.sh
Hello Tim,
Thanks for your response. I was looking up the code to generate the models. https://github.com/openvax/mhcflurry/blob/master/downloads-generation/models_class1/GENERATE.sh
Does write_validation_data.py generate the data for building the model with mhcflurry-class1-select-allele-specific-models ?
Another question is how does one determine the model accuracy compare to an older/other models? I see AUC and PPV metrix in your manuscript (CellSystems 2020). Is that code available in your repo. I will highly appreciate if you can point the location.
Thanks
Those are actually the old allele-specific models. The new pan-allele models are generated in:
https://github.com/openvax/mhcflurry/tree/master/downloads-generation/models_class1_pan
If you do want to fit allele-specific models, have a look at the models_class1_unselected
download, which fits a large number of possible models for each allele. The models_class1
download that you mentioned is doing the model selection, based on validation data that is written out using the write_validation_data.py script.
For your second question, I compute AUC using the scikit learn routine: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
Here is code to compute PPV:
def ppv(y_true, predictions):
df = pandas.DataFrame({"prediction": predictions, "y_true": y_true})
return df.sort_values("prediction", ascending=False)[:int(y_true.sum())].y_true.mean()
Hope that helps.
Hi I just saw the new docker container for mhcflurry. Does one exist for MHCflutty 1.6.0? I didn't see a tag for other versions in dockerhub/Builds.
Additionally, do we download the various models when we create the image from the dockerfile?
Thanks Amin