griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

ERROR: output_parser.py(195)match_wildtype_and_mutant_entry_for_missense() -> wt_epitope_seq = wt_result['wt_epitope_seq'] -- (Pdb) #1039

Closed ZoeChao2001 closed 8 months ago

ZoeChao2001 commented 9 months ago

Installation Type

Standalone

pVACtools Version / Docker Image

4.0.5

Python Version

3.7.16

Operating System

UBUNTU

Describe the bug

It 'Parsed Output File for Allele ' successfuly, and then one issue happened

Parsing prediction file for Allele HLA-C*07:02 and Epitope Length 8 - Entries 3801-4000
> /home/zzd/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py(195)match_wildtype_and_mutant_entry_for_missense()
-> wt_epitope_seq = wt_result['wt_epitope_seq']
(Pdb)  
(Pdb)

It appeared (Pdb) input line, and pVACseq can't keep on running. when I input "q", the trace back was return (Pdb) q Traceback (most recent call last): File "/home/z/miniconda3/envs/pvactools/bin/pvacseq", line 8, in <module> sys.exit(main()) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 451, in execute split_parsed_output_files = self.parse_outputs(chunks) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 412, in parse_outputs parser.execute() File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 629, in execute iedb_results = self.process_input_iedb_file(tsv_entries) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 514, in process_input_iedb_file iedb_results = self.parse_iedb_file(tsv_entries) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 800, in parse_iedb_file return self.match_wildtype_and_mutant_entries(iedb_results, wt_iedb_results) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 401, in match_wildtype_and_mutant_entries self.match_wildtype_and_mutant_entry_for_missense(result, mt_position, wt_results, previous_result) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 195, in match_wildtype_and_mutant_entry_for_missense wt_epitope_seq = wt_result['wt_epitope_seq'] File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 195, in match_wildtype_and_mutant_entry_for_missense wt_epitope_seq = wt_result['wt_epitope_seq'] File "/home/z/miniconda3/envs/pvactools/lib/python3.7/bdb.py", line 88, in trace_dispatch return self.dispatch_line(frame) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/bdb.py", line 113, in dispatch_line if self.quitting: raise BdbQuit bdb.BdbQuit

How to reproduce this bug

pvacseq run ./sample.vep37.vcf.gz Pt1 HLA-A*01:01,HLA-A*02:01,HLA-B*07:02,HLA-B*08:01,HLA-C*07:01,HLA-C*07:02 all output -e1 8,9,10,11

Input files

No response

Log output

Please see description.

Output files

No response

susannasiebert commented 9 months ago

Thank you for this bug report. Would it be possible for you to share your input VCF with us as well as the contents of the output directory? We will need those files for further debugging.

ZoeChao2001 commented 9 months ago

Hello@susannasiebert, this is the input file I've created by extracting variant information, and it has been annotated using VEP. Gt1.vep37.vcf.gz

Initially, I ran pVACseq using a multi-sample VCF, encountering the issues described earlier. Subsequently, I ran again using a one-sample VCF file(the attached input file), but encountered the same problem with the sample. bug

The code I used is provided below. vep -i Gt1.vcf.gz -o Gt1.vep37.vcf --cache --dir_cache /home/zzd/miniconda3/envs/pvactools --assembly GRCh37 --offline --fasta /home/zzd/miniconda3/envs/pvactools/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.chromosome.1.fa --plugin Frameshift --plugin Wildtype --vcf --tsl gzip Gt1.vep37.vcf pvacseq run -e1 8,9,10,11 --run-reference-proteome-similarity -c 1 -t 10 ./Gt1.vep37.vcf.gz Pt1 HLA-A*01:01,HLA-A*02:01,HLA-B*07:02,HLA-B*08:01,HLA-C*07:01,HLA-C*07:02 all /home/zzd/work/pVACtools/run/lili/output/Gt1_output

I have attached screenshots of some output directory due to the sheer size of the complete output. If you require specific files, please let me know, and I'll promptly provide them. MHC_Class_I tmp

I would greatly appreciate your assistance with debugging. Thank you for your time and support.

susannasiebert commented 9 months ago

Unfortunately, without the output it will be difficult for me to replicate this issue. I suspect that there might be a problem with one of the prediction files from IEDB. We've noticed some problems with that when using the IEDB API vs the standalone IEDB software where sporadically the results don't match the input epitopes. We've been in touch with IEDB for this but haven't come to a resolution.

My suggestion would be to delete from the tmp folder all files that end in tsv_6401-6600 since that seems to be the chunk that is causing problems. That should regenerate those prediction files on your next run and, with luck, you won't have this same problem with the regenerated data. You might continue running into this error on a different chunk of data in which case you would need to delete the tmp prediction files for any chunk that encounters this problem.

Alternatively, we highly recommend using the standalone IEDB software, if you are able to install them locally. You can also use our Docker containers, which come with the IEDB software preinstalled. This should also speed things up a bit since you seem to be processing quite a large set of data.

ZoeChao2001 commented 9 months ago

Thank you for the comprehensive guidance. Despite my efforts to address the problem by removing all files with the ending tsv_6401-6600 from the tmp folder, the issue persists. However, I have successfully installed the standalone IEDB software locally. Upon rerunning the command, the previous error did not reoccur. Instead, a new issue surfaced, displaying "Unable to find full_peptide for variant"(as figure 1). Notably, the program did not terminate. Upon inspecting the output directory, I observed it has been over one day since the last file was generated, and the "filtered.tsv" file have not been output(as figure 2 and figure 3). I am uncertain whether this is a regular occurrence or indicative of a new problem. image image image Thank you for your time and support.

susannasiebert commented 9 months ago

I'm sorry to hear that the deleting those intermediate files didn't solve your issue glad to hear that you got it working with standalone IEDB.

The long runtime is expected for this step if you run it with the default. By default the Calculate Reference Proteome Similarity step uses the Blast API, with is very slow, especially since you have a lot of variants in your VCF. I see that you marked your comment as resolved so I assume that the command eventually finished. For future reference, we recommend using this step with a reference proteome fasta and use the --peptide-fasta option. This option is an alternative to BLAST altogether and searches the reference proteome fasta for matches directly. From the description of your problem it's not clear to me what options you chose for the reference proteome similaritystep but using the --peptide-fasta option in the future should speed things up significantly.

susannasiebert commented 8 months ago

Closing this issue due to inactivity.