openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
191 stars 57 forks source link

ValueError: Allele HLA-A*24:07 has no percentile rank information #155

Closed JaredJGartner closed 4 years ago

JaredJGartner commented 4 years ago

Hello,

While trying to run predictions on a wide variety of HLAs and peptides I hit an error for a particular allele, HLA-A*24:07. The error output appears to signal that this allele isn't in the allele_to_percent_rank_transform. Am I supposed to set up the % rank transformation for some of the pan alleles? I am using the python API and the class I pan model and the predict to dataframe with the following commands. I'd be happy to provide more information if needed. Thank you for any help/

from mhcflurry import Class1AffinityPredictor mhcflurry = Class1AffinityPredictor.load('mhcflurry/4/1.4.0/models_class1_pan/models.with_mass_spec/') df_mhcflurry = mhcflurry.predict_to_dataframe(alleles=_hla_list, peptides=_peptides_list)


KeyError Traceback (most recent call last) /conda/envs/mhcflurry-env/lib/python3.6/site-packages/mhcflurry/class1_affinity_predictor.py in percentile_ranks(self, affinities, allele, alleles, throw) 896 try: --> 897 transform = self.allele_to_percent_rank_transform[allele] 898 return transform.transform(affinities)

KeyError: 'HLA-A*24:07'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)

in ----> 1 df_mhcflurry = mhcflurry.predict_to_dataframe(alleles=_hla_list, peptides=_peptides_list) /conda/envs/mhcflurry-env/lib/python3.6/site-packages/mhcflurry/class1_affinity_predictor.py in predict_to_dataframe(self, peptides, alleles, allele, throw, include_individual_model_predictions, include_percentile_ranks, include_confidence_intervals, centrality_measure, model_kwargs) 1227 df.prediction, 1228 alleles=df.normalized_allele.values, -> 1229 throw=throw) 1230 else: 1231 warnings.warn("No percentile rank information available.") /conda/envs/mhcflurry-env/lib/python3.6/site-packages/mhcflurry/class1_affinity_predictor.py in percentile_ranks(self, affinities, allele, alleles, throw) 912 for (allele, sub_df) in df.groupby("allele"): 913 df.loc[sub_df.index, "result"] = self.percentile_ranks( --> 914 sub_df.affinity, allele=allele, throw=throw) 915 return df.result.values 916 /conda/envs/mhcflurry-env/lib/python3.6/site-packages/mhcflurry/class1_affinity_predictor.py in percentile_ranks(self, affinities, allele, alleles, throw) 900 msg = "Allele %s has no percentile rank information" % allele 901 if throw: --> 902 raise ValueError(msg) 903 warnings.warn(msg) 904 return numpy.ones(len(affinities)) * numpy.nan # Return NaNs ValueError: Allele HLA-A*24:07 has no percentile rank information
timodonnell commented 4 years ago

Hi @JaredJGartner , this is currently a known limitation. We only have percentile rank distributions for alleles with training data. I'm running something now that should fix this though. If all goes well I should be able to post new models with percent rank distributions for all alleles sometime later this week or next. I'll update with how that goes.

In the mean time one workaround would be to set throw=False when you call predict_to_dataframe. Then if you need percentile ranks you can call predictor.allele_to_percent_rank_transform[similar_allele].transform(affinities) where predictor is your Class1AffinityPredictor instance and similar_allele is an allele similar to HLA-A24:07 that we have training data for (e.g. HLA-A24:03).

timodonnell commented 4 years ago

The '20191111' branch ( https://github.com/openvax/mhcflurry/tree/20191111 ) now has models that fix this issue (percent ranks are supported for all alleles). I'll update here when this gets merged into master and released (likely version 1.6.0), hopefully in the next week or two.

JaredJGartner commented 4 years ago

Thank you. very much

timodonnell commented 4 years ago

The mentioned fix has been merged into master now and is released as MHCflurry 1.6.0