Closed ZoeChao2001 closed 8 months ago
Thank you for this bug report. Would it be possible for you to share your input VCF with us as well as the contents of the output directory? We will need those files for further debugging.
Hello@susannasiebert, this is the input file I've created by extracting variant information, and it has been annotated using VEP. Gt1.vep37.vcf.gz
Initially, I ran pVACseq using a multi-sample VCF, encountering the issues described earlier. Subsequently, I ran again using a one-sample VCF file(the attached input file), but encountered the same problem with the sample.
The code I used is provided below.
vep -i Gt1.vcf.gz -o Gt1.vep37.vcf --cache --dir_cache /home/zzd/miniconda3/envs/pvactools --assembly GRCh37 --offline --fasta /home/zzd/miniconda3/envs/pvactools/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.chromosome.1.fa --plugin Frameshift --plugin Wildtype --vcf --tsl
gzip Gt1.vep37.vcf
pvacseq run -e1 8,9,10,11 --run-reference-proteome-similarity -c 1 -t 10 ./Gt1.vep37.vcf.gz Pt1 HLA-A*01:01,HLA-A*02:01,HLA-B*07:02,HLA-B*08:01,HLA-C*07:01,HLA-C*07:02 all /home/zzd/work/pVACtools/run/lili/output/Gt1_output
I have attached screenshots of some output directory due to the sheer size of the complete output. If you require specific files, please let me know, and I'll promptly provide them.
I would greatly appreciate your assistance with debugging. Thank you for your time and support.
Unfortunately, without the output it will be difficult for me to replicate this issue. I suspect that there might be a problem with one of the prediction files from IEDB. We've noticed some problems with that when using the IEDB API vs the standalone IEDB software where sporadically the results don't match the input epitopes. We've been in touch with IEDB for this but haven't come to a resolution.
My suggestion would be to delete from the tmp folder all files that end in tsv_6401-6600 since that seems to be the chunk that is causing problems. That should regenerate those prediction files on your next run and, with luck, you won't have this same problem with the regenerated data. You might continue running into this error on a different chunk of data in which case you would need to delete the tmp prediction files for any chunk that encounters this problem.
Alternatively, we highly recommend using the standalone IEDB software, if you are able to install them locally. You can also use our Docker containers, which come with the IEDB software preinstalled. This should also speed things up a bit since you seem to be processing quite a large set of data.
Thank you for the comprehensive guidance. Despite my efforts to address the problem by removing all files with the ending tsv_6401-6600 from the tmp folder, the issue persists. However, I have successfully installed the standalone IEDB software locally. Upon rerunning the command, the previous error did not reoccur. Instead, a new issue surfaced, displaying "Unable to find full_peptide for variant"(as figure 1). Notably, the program did not terminate. Upon inspecting the output directory, I observed it has been over one day since the last file was generated, and the "filtered.tsv" file have not been output(as figure 2 and figure 3). I am uncertain whether this is a regular occurrence or indicative of a new problem. Thank you for your time and support.
I'm sorry to hear that the deleting those intermediate files didn't solve your issue glad to hear that you got it working with standalone IEDB.
The long runtime is expected for this step if you run it with the default. By default the Calculate Reference Proteome Similarity step uses the Blast API, with is very slow, especially since you have a lot of variants in your VCF. I see that you marked your comment as resolved so I assume that the command eventually finished. For future reference, we recommend using this step with a reference proteome fasta and use the --peptide-fasta
option. This option is an alternative to BLAST altogether and searches the reference proteome fasta for matches directly. From the description of your problem it's not clear to me what options you chose for the reference proteome similaritystep but using the --peptide-fasta
option in the future should speed things up significantly.
Closing this issue due to inactivity.
Installation Type
Standalone
pVACtools Version / Docker Image
4.0.5
Python Version
3.7.16
Operating System
UBUNTU
Describe the bug
It 'Parsed Output File for Allele ' successfuly, and then one issue happened
It appeared (Pdb) input line, and pVACseq can't keep on running. when I input "q", the trace back was return
(Pdb) q Traceback (most recent call last): File "/home/z/miniconda3/envs/pvactools/bin/pvacseq", line 8, in <module> sys.exit(main()) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 451, in execute split_parsed_output_files = self.parse_outputs(chunks) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 412, in parse_outputs parser.execute() File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 629, in execute iedb_results = self.process_input_iedb_file(tsv_entries) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 514, in process_input_iedb_file iedb_results = self.parse_iedb_file(tsv_entries) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 800, in parse_iedb_file return self.match_wildtype_and_mutant_entries(iedb_results, wt_iedb_results) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 401, in match_wildtype_and_mutant_entries self.match_wildtype_and_mutant_entry_for_missense(result, mt_position, wt_results, previous_result) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 195, in match_wildtype_and_mutant_entry_for_missense wt_epitope_seq = wt_result['wt_epitope_seq'] File "/home/z/miniconda3/envs/pvactools/lib/python3.7/site-packages/pvactools/lib/output_parser.py", line 195, in match_wildtype_and_mutant_entry_for_missense wt_epitope_seq = wt_result['wt_epitope_seq'] File "/home/z/miniconda3/envs/pvactools/lib/python3.7/bdb.py", line 88, in trace_dispatch return self.dispatch_line(frame) File "/home/z/miniconda3/envs/pvactools/lib/python3.7/bdb.py", line 113, in dispatch_line if self.quitting: raise BdbQuit bdb.BdbQuit
How to reproduce this bug
Input files
No response
Log output
Please see description.
Output files
No response