griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
144 stars 59 forks source link

`ValueError: substring not found` when running proteome similarity search with pvacseq #1150

Open lukaas33 opened 2 months ago

lukaas33 commented 2 months ago

Installation Type

Docker

pVACtools Version / Docker Image

latest

Python Version

No response

Operating System

Ubuntu

Describe the bug

When running pvacseq to make predictions including the proteome similarity search option it fails at this step. The exact output can be viewed below.

How to reproduce this bug

`pvacseq run /shared_dir/neoantigen.stringtie.vcf tumor_sample HLA-A*02:01,HLA-A*23:01,HLA-B*27:05,HLA-B*42:01,HLA-C*01:02,HLA-C*17:01,DPB1*01:01,DPB1*02:01,DQB1*03:01,DQB1*03:01,DRB1*01:03,DRB1*11:02 MHCflurry BigMHC_IM BigMHC_EL NetMHCIIpan NetMHCIIpanEL /shared_dir/pvacseq/ -e1 10 -e2 15 --iedb-install-directory /opt/iedb -t 60 --run-reference-proteome-similarity`

Input files

neoantigen.stringtie (2).zip

Log output

Combining Parsed Prediction Files
Completed
Creating aggregated report
Tumor clonal VAF estimated as 0.5 (estimated from Tumor DNA VAF data). Assuming variants with VAF < 0.25 are subclonal
Completed
Calculating Manufacturability Metrics
Completed
Running Binding Filters
Completed
Running Coverage Filters
Completed
Running Transcript Support Level Filter
Complete
Running Top Score Filter
Completed
Calculating Reference Proteome Similarity
Traceback (most recent call last):
  File "/usr/local/bin/pvacseq", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main
    args[0].func.main(args[1])
  File "/usr/local/lib/python3.7/site-packages/pvactools/tools/pvacseq/run.py", line 142, in main
    pipeline.execute()
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 484, in execute
    PostProcessor(**post_processing_params).execute()
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/post_processor.py", line 65, in execute
    self.calculate_reference_proteome_similarity()
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/post_processor.py", line 247, in calculate_reference_proteome_similarity
    aggregate_metrics_file=aggregate_metrics_file,
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 595, in execute
    unique_peptides = pymp.shared.list(self._get_unique_peptides(mt_records_dict, wt_records_dict))
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 575, in _get_unique_peptides
    peptide, full_peptide = self._get_peptide(line, mt_records_dict, wt_records_dict)
  File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 314, in _get_peptide
    subpeptide_position = full_peptide.index(epitope)
ValueError: substring not found

Output files

No response

susannasiebert commented 2 months ago

Thank you for this bug report. Did your run produce any of the main output files (all_epitopes.tsv or aggregated.tsv). If so, can you please attach those to this ticket as well as the .fasta file from your run? It would speed up debugging a lot to not have to redo the predictions.

lukaas33 commented 2 months ago

tumor_sample.all_epitopes.aggregated.zip tumor_sample.zip

This is the MHC1 output for the run described abobe.