openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
191 stars 57 forks source link

number of alleles #185

Closed kevinkovalchik closed 3 years ago

kevinkovalchik commented 3 years ago

Hello!

Is it possible to tell MhcFlurry that I want to make predictions for more than 6 alleles? I am aware 6 is the biologically relevant number for human samples, but I'm doing some exploration in data scoring and am interested in including alleles not expected to bind (e.g. survey all supertypes or something like that). I'm sure I could edit the limit in the code, but is there an argument somewhere I am missing that I can pass to specify the maximum number of alleles?

Thanks! Kevin

timodonnell commented 3 years ago

Hi Kevin,

Can you give an example of what you are trying to do that is hitting a limit? How to get around this will depend on what command or function you're using.

Tim

kevinkovalchik commented 3 years ago

Hi Tim,

I am using Class1PresentationPredictor.predict. Here's an example:

>>>CLASS1_PRESENTATION_PREDICTOR.predict(peptides=['AAAAAAAAAAA'], alleles=SUPERTYPES) Traceback (most recent call last): File "", line 1, in File "/Data/Development/PercMhc/venv/lib/python3.8/site-packages/mhcflurry/class1_presentation_predictor.py", line 516, in predict raise ValueError( ValueError: When alleles is a list, it must have at most 6 elements. These alleles are taken to be a genotype for an individual, and the strongest prediction across alleles will be taken for each peptide. Note that this differs from Class1AffinityPredictor.predict(), where alleles is expected to be the same length as peptides.

SUPERTYPES is a list of 12 alleles.

Although now that I look at this I realize I probably want something that the predict function doesn't give me. I'm interested in getting the predictions for all 12 alleles, but this predict function tells you the best allele and score for each peptide, right?

How I am doing it now is iterating through each allele. This is fine, I just was wondering if there was already a built-in way to directly output a dataframe with predictions for multiple alleles, preferably without a limit on the number of alleles.

Kevin

timodonnell commented 3 years ago

Hi Kevin,

There actually is a way to do this although it's a bit confusing. You can pass alleles as a dict mapping arbitrary keys (the "sample names") to a list of alleles (the "sample genotypes"). For your purposes you can just give a single allele as the "sample genotype" and name the sample by the allele name. For example you could pass this as the alleles:

{
"A*02:01": ["A*02:01"],
"A*03:01": ["A*03:01"],
}

This should give results for all alleles for all peptides.

There is some more info in the docstring of the method.

Hope this helps.

Tim

kevinkovalchik commented 3 years ago

Nice! That looks like it will do just what I want, thanks.

Kevin

marshelma commented 1 week ago

Hi Kevin,

There actually is a way to do this although it's a bit confusing. You can pass alleles as a dict mapping arbitrary keys (the "sample names") to a list of alleles (the "sample genotypes"). For your purposes you can just give a single allele as the "sample genotype" and name the sample by the allele name. For example you could pass this as the alleles:

{
"A*02:01": ["A*02:01"],
"A*03:01": ["A*03:01"],
}

This should give results for all alleles for all peptides.

There is some more info in the docstring of the method.

Hope this helps.

Tim

Somehow the dictionary didn't work for me, it give the error:

Traceback (most recent call last): File "/home/xma/.local/bin/mhcflurry-predict-scan", line 8, in sys.exit(run()) File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/predict_scan_command.py", line 313, in run result_df = predictor.predict_sequences( File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/class1_presentation_predictor.py", line 802, in predict_sequences result_df = self.predict( File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/class1_presentation_predictor.py", line 543, in predict df = self.predict_affinity( File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/class1_presentation_predictor.py", line 202, in predict_affinity predictions_df[allele] = self.affinity_predictor.predict( File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/class1_affinity_predictor.py", line 1085, in predict df = self.predict_to_dataframe( File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/class1_affinity_predictor.py", line 1162, in predict_to_dataframe normalized_allele = normalize_allele_name(allele) File "/home/xma/.local/lib/python3.10/site-packages/mhcflurry/common.py", line 62, in normalize_allele_name raise ValueError("Invalid MHC allele name: %s" % raw_name) ValueError: Invalid MHC allele name: A02:01:[A02:01]

timodonnell commented 1 week ago

@marshelma it looks like you are using the commandline interface not the Python API. The recommendation I gave here is for the python API. You can pass as many alleles as you want to mhcflurry-predict-scan command by specifying a space-separated list of alleles to the "--alleles" agrument, see mhcflurry-predict-scan -h.

marshelma commented 1 week ago

@marshelma it looks like you are using the commandline interface not the Python API. The recommendation I gave here is for the python API. You can pass as many alleles as you want to mhcflurry-predict-scan command by specifying a space-separated list of alleles to the "--alleles" agrument, see mhcflurry-predict-scan -h.

Thank you!