openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
191 stars 57 forks source link

Parsing error when using Class1PresentationPredictor #203

Closed liliblu closed 11 months ago

liliblu commented 2 years ago

Hello,

I'm getting a very confusing error that probably has something to do with my configuration, since it is such a basic thing I am trying to run and I couldn't find any previous similar issues. I was hoping that you might be able to point me in the right direction.

When submitting alleles to the Class1PresentationPredictor.predict() method, the HLAs are not parsing correctly. E.g.:

from mhcflurry import Class1PresentationPredictor

predictor = Class1PresentationPredictor.load()

predictions = predictor.predict(
    peptides=['LAMDEFIERY'],
    alleles={'samp_0': 'HLA-A01:01'},
    sample_names=['samp_0'],
)

Here is the traceback I get:

  0%|          | 0/1 [00:00<?, ?it/s]
Predicting processing.
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
/data/miniconda3/envs/python_3.8.x/lib/python3.8/site-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  updates=self.state_updates,
100%|██████████| 1/1 [00:10<00:00, 10.39s/it]
  0%|          | 0/1 [00:00<?, ?it/s]WARNING:root:No sequences for allele(s): HLA-H.
Supported alleles: Atbe-B*01:01 Atbe-E*03:01 Atbe-G*03:01 Atbe-G*03:02 Atbe-G*06:01 Atfu-B*01:01 Atfu-B*01:02 Atfu-B*01:03 Atfu-B*02:01 Atfu-B*02:02 Atfu-B*02:03 Atfu-B*02:04 Atfu-B*03:01 Atfu-B*03:02 Atfu-B*04:01 Atfu-B*04:02 Atfu-B*05:01 Atfu-E*03:01 Atfu-E*03:02 Atfu-G*06:01 Atfu-G*06:02 Atfu-G*06:03 Atfu-G*22:01 Atfu-G*23:01 Atfu-G*23:02 Atfu-G*23:03 Atfu-G*24:01 Atfu-G*25:01 BoLA-1*07:01 BoLA-1*09:01 BoLA-1*09:02 BoLA-1*19:01 BoLA-1*20:01 BoLA-1*21:01 BoLA-1*23:01 BoLA-1*28:01 BoLA-1*29:01 BoLA-1*31:01 BoLA-1*31:02 BoLA-1*42:01 BoLA-1*49:01 BoLA-1*61:01 BoLA-1*67:01 BoLA-1*74:01 BoLA-1*75:01 BoLA-2*05:01 BoLA-2*06:01 BoLA-2*06:02 BoLA-2*08:01 BoLA-2*08:02 BoLA-2*12:01 BoLA-2*16:01 BoLA-2*16:02 BoLA-2*16:03 BoLA-2*18:01 BoLA-2*18:02 BoLA-2*22:01 BoLA-2*25:01 BoLA-2*26:01 BoLA-2*26:02 BoLA-2*26:03 BoLA-2*26:04 BoLA-2*30:01 BoLA-2*32:01 BoLA-2*32:02 BoLA-2*43:01 BoLA-2*44:01 BoLA-2*44:02 BoLA-2*45:01 BoLA-2*45:02 BoLA-2*46:01 BoLA-2*47:01 BoLA-2*48:01 BoLA-2*54:01 BoLA-2*55:01 BoLA-2*56:01 BoLA-2*57:01 BoLA-2*60:01 BoLA-2*60:02 BoLA-2*62:01 BoLA-2*69:01 BoLA-2*70:01 BoLA-2*71:01 BoLA-2*71:02 BoLA-2*75:01 BoLA-2*76:01 BoLA-2*77:01 BoLA-2*78:01 BoLA-2*79:01 BoLA-3*01:01 BoLA-3*01:02 BoLA-3*01:03 BoLA-3*02:01 BoLA-3*04:01 BoLA-3*04:02 BoLA-3*04:03 BoLA-3*04:04 BoLA-3*10:01 BoLA-3*11:01 BoLA-3*17:01 + 14640 more alleles
  0%|          | 0/1 [00:00<?, ?it/s]
Predicting affinities.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [4], in <cell line: 5>()
      1 from mhcflurry import Class1PresentationPredictor
      3 predictor = Class1PresentationPredictor.load()
----> 5 predictions = predictor.predict(
      6     peptides=['LAMDEFIERY'],
      7     alleles={'samp_0': 'HLA-A01:01'},
      8     sample_names=['samp_0'],
      9 )

File ~/.local/lib/python3.8/site-packages/mhcflurry/class1_presentation_predictor.py:543, in Class1PresentationPredictor.predict(self, peptides, alleles, sample_names, n_flanks, c_flanks, include_affinity_percentile, verbose, throw)
    540     processing_scores = None
    542 if alleles:
--> 543     df = self.predict_affinity(
    544         peptides=peptides,
    545         alleles=alleles,
    546         sample_names=sample_names,  # might be None
    547         include_affinity_percentile=include_affinity_percentile,
    548         verbose=verbose,
    549         throw=throw)
    551     df["affinity_score"] = from_ic50(df.affinity)
    552 else:
    553     # Processing prediction only.

File ~/.local/lib/python3.8/site-packages/mhcflurry/class1_presentation_predictor.py:238, in Class1PresentationPredictor.predict_affinity(self, peptides, alleles, sample_names, include_affinity_percentile, verbose, throw)
    236 sample_peptides = EncodableSequences.create(sub_df.peptide.values)
    237 for allele in alleles[sample]:
--> 238     predictions_df[allele] = self.affinity_predictor.predict(
    239         peptides=sample_peptides,
    240         allele=allele,
    241         model_kwargs={'batch_size': PREDICT_BATCH_SIZE},
    242         throw=throw)
    243 df.loc[
    244     sub_df.index, "affinity"
    245 ] = predictions_df.min(1).values
    246 df.loc[
    247     sub_df.index, "best_allele"
    248 ] = predictions_df.idxmin(1).values

File ~/.local/lib/python3.8/site-packages/mhcflurry/class1_affinity_predictor.py:1081, in Class1AffinityPredictor.predict(self, peptides, alleles, allele, throw, centrality_measure, model_kwargs)
   1043 def predict(
   1044         self,
   1045         peptides,
   (...)
   1049         centrality_measure=DEFAULT_CENTRALITY_MEASURE,
   1050         model_kwargs={}):
   1051     """
   1052     Predict nM binding affinities.
   1053     
   (...)
   1079     numpy.array of predictions
   1080     """
-> 1081     df = self.predict_to_dataframe(
   1082         peptides=peptides,
   1083         alleles=alleles,
   1084         allele=allele,
   1085         throw=throw,
   1086         include_percentile_ranks=False,
   1087         include_confidence_intervals=False,
   1088         centrality_measure=centrality_measure,
   1089         model_kwargs=model_kwargs
   1090     )
   1091     return df.prediction.values

File ~/.local/lib/python3.8/site-packages/mhcflurry/class1_affinity_predictor.py:1256, in Class1AffinityPredictor.predict_to_dataframe(self, peptides, alleles, allele, throw, include_individual_model_predictions, include_percentile_ranks, include_confidence_intervals, centrality_measure, model_kwargs)
   1254     logging.warning(msg)
   1255     if throw:
-> 1256         raise ValueError(msg)
   1257 mask = df.supported_peptide & (
   1258     ~df.normalized_allele.isin(unsupported_alleles))
   1260 row_slice = None

ValueError: No sequences for allele(s): HLA-H.
Supported alleles: Atbe-B*01:01 Atbe-E*03:01 Atbe-G*03:01 Atbe-G*03:02 Atbe-G*06:01 Atfu-B*01:01 Atfu-B*01:02 Atfu-B*01:03 Atfu-B*02:01 Atfu-B*02:02 Atfu-B*02:03 Atfu-B*02:04 Atfu-B*03:01 Atfu-B*03:02 Atfu-B*04:01 Atfu-B*04:02 Atfu-B*05:01 Atfu-E*03:01 Atfu-E*03:02 Atfu-G*06:01 Atfu-G*06:02 Atfu-G*06:03 Atfu-G*22:01 Atfu-G*23:01 Atfu-G*23:02 Atfu-G*23:03 Atfu-G*24:01 Atfu-G*25:01 BoLA-1*07:01 BoLA-1*09:01 BoLA-1*09:02 BoLA-1*19:01 BoLA-1*20:01 BoLA-1*21:01 BoLA-1*23:01 BoLA-1*28:01 BoLA-1*29:01 BoLA-1*31:01 BoLA-1*31:02 BoLA-1*42:01 BoLA-1*49:01 BoLA-1*61:01 BoLA-1*67:01 BoLA-1*74:01 BoLA-1*75:01 BoLA-2*05:01 BoLA-2*06:01 BoLA-2*06:02 BoLA-2*08:01 BoLA-2*08:02 BoLA-2*12:01 BoLA-2*16:01 BoLA-2*16:02 BoLA-2*16:03 BoLA-2*18:01 BoLA-2*18:02 BoLA-2*22:01 BoLA-2*25:01 BoLA-2*26:01 BoLA-2*26:02 BoLA-2*26:03 BoLA-2*26:04 BoLA-2*30:01 BoLA-2*32:01 BoLA-2*32:02 BoLA-2*43:01 BoLA-2*44:01 BoLA-2*44:02 BoLA-2*45:01 BoLA-2*45:02 BoLA-2*46:01 BoLA-2*47:01 BoLA-2*48:01 BoLA-2*54:01 BoLA-2*55:01 BoLA-2*56:01 BoLA-2*57:01 BoLA-2*60:01 BoLA-2*60:02 BoLA-2*62:01 BoLA-2*69:01 BoLA-2*70:01 BoLA-2*71:01 BoLA-2*71:02 BoLA-2*75:01 BoLA-2*76:01 BoLA-2*77:01 BoLA-2*78:01 BoLA-2*79:01 BoLA-3*01:01 BoLA-3*01:02 BoLA-3*01:03 BoLA-3*02:01 BoLA-3*04:01 BoLA-3*04:02 BoLA-3*04:03 BoLA-3*04:04 BoLA-3*10:01 BoLA-3*11:01 BoLA-3*17:01 + 14640 more alleles

However, when I just directly try to parse the exact same string with mhcgnomes, it works fine:

from mhcgnomes import parse

parse('HLA-A01:01')

Result:

Allele(gene=Gene(species=Species(name='Homo sapiens', mhc_prefix='HLA'), name='A'), allele_fields=('01', '01'), annotations=(), mutations=())

Any guidance or help you could provide would be very appreciated, thank you!

timodonnell commented 2 years ago

The issue is that when 'alleles' parameter is a dict then the values must be lists of alleles not strings. So this should work:

predictions = predictor.predict(
    peptides=['LAMDEFIERY'],
    alleles={'samp_0': ['HLA-A01:01']},
    sample_names=['samp_0'],
)

or, simpler:

predictions = predictor.predict(
    peptides=['LAMDEFIERY'],
    alleles=['HLA-A01:01'],
)

This is a very confusing error though and we should fix it to throw a better error message.

liliblu commented 2 years ago

That explains it! Thank you so much!