openvax / topiary

Predict mutated T-cell epitopes from sequencing data
Apache License 2.0
27 stars 9 forks source link

Error: missing binding predictions #74

Open kevin199011 opened 7 years ago

kevin199011 commented 7 years ago

Hi, I was running several of my VCF files generated by GATK haplotypeCaller from multiple samples. However, only on VCF file occurred following errors. Could you please tell me what's wrong with this file? Thank you!

Traceback (most recent call last): File "/Users/kevinma/anaconda/bin/topiary", line 58, in main(args) File "/Users/kevinma/anaconda/bin/topiary", line 43, in main epitopes = predict_epitopes_from_args(args) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/topiary/predict_epitopes.py", line 294, in predict_epitopes_from_args raise_on_variant_effect_error=not args.skip_variant_errors) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/topiary/predict_epitopes.py", line 261, in predict_epitopes_from_variants wildtype_ligandome_dict=wildtype_ligandome_dict) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/topiary/predict_epitopes.py", line 157, in predict_epitopes_from_mutation_effects binding_predictions = mhc_model.predict(protein_subsequences) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/mhctools/base_predictor.py", line 225, in predict return self.predict_subsequences(sequence_dict, peptide_lengths=None) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/mhctools/base_predictor.py", line 207, in predict_subsequences binding_predictions = self.predict_peptides(peptide_list) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/mhctools/base_commandline_predictor.py", line 332, in predict_peptides alleles=self.alleles) File "/Users/kevinma/anaconda/lib/python3.5/site-packages/mhctools/base_predictor.py", line 143, in _check_results len(missing), example_peptide, example_allele)) ValueError: Missing 32 binding predictions, example peptide='UGTTVRDCTQM' allele='HLA-A*02:06'

Kevin

iskandr commented 7 years ago

Hi Kevin, I think the problem is that we're hitting on stretches of the reference proteome with non-canonical amino acids. In the error you pasted, example peptide starts with "U", which is selenocysteine. We should handle this case more gracefully but I'm not actually sure what the best behavior would be (since none of the underlying predictors support non-canonical amino acids).

@tavinathanson @julia326 @timodonnell -- opinions?

iskandr commented 6 years ago

@timodonnell @tavinathanson @julia326 Thinking about having Topiary filter out predictions involving special amino acids before calling MHCtools and warning the user when that happens. Opinions?

timodonnell commented 6 years ago

That seems reasonable to me @iskandr

Apb58 commented 5 years ago

Hey Alex; I am running into the same problem with some samples I'm running now (non-canonical amino acids) and it's causing the sample to fail. Is there a way to avoid these? I have the --skip-variant-errors flag set, but the pipeline exits with no output as soon as it runs into this case.