griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
131 stars 58 forks source link

if protein_identifiers_from_label[protein_label] is not None: KeyError: 92 #1099

Closed brycemash closed 2 months ago

brycemash commented 2 months ago

Installation Type

Standalone

pVACtools Version / Docker Image

3.1.3

Python Version

python=3.6

Operating System

No response

Describe the bug

I keep getting this error, even after deleting the output folder and restarting the run.

How to reproduce this bug

conda activate /broad/dunnlab/BLM/conda_libraries/pvactools_conda

vep \
--input_file /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount.vcf.gz \
--output_file /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount.vep.vcf \
--format vcf --vcf --symbol --terms SO --tsl \
--hgvs --fasta /broad/dunnlab/BLM/pvac/usftp21.novogene.com/mm10.fa \
--offline --cache \
--dir_cache /broad/dunnlab/BLM/pvac \
--plugin Frameshift --plugin Wildtype \
--dir_plugins /broad/dunnlab/BLM/pvac/VEP_plugins \
--species mus_musculus --cache_version 102 --force_overwrite

gzip /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount.vep.vcf

apptainer shell -B /broad/dunnlab/ /broad/dunnlab/Docker_Container_Location/vatools_latest.simg

ref-transcript-mismatch-reporter -f hard \
-o /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch.vep.vcf \
/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount.vep.vcf.gz

input_vcf='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch.vep.vcf'
output_vcf='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch_exp.vep.vcf'
expression_file='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_rnaseq/gene_abundance.tsv'
vcf-expression-annotator -e "abundance" -i "gene" -o $output_vcf --ignore-ensembl-id-version -s "CT2A" $input_vcf $expression_file custom gene
# 797 of 4091 genes did not have an expression entry for their gene id.

# /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/pindel_filtered_mismatch.vep.vcf | head
# bcftools query -l /broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/pindel_filtered_mismatch.vep.vcf

input_vcf='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch_exp.vep.vcf'
output_vcf='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch_t+exp.vep.vcf'
expression_file='/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_rnaseq/abundance.tsv'
vcf-expression-annotator -e "tpm" -i "target_id" -o $output_vcf --ignore-ensembl-id-version -s "CT2A" $input_vcf $expression_file kallisto transcript

conda activate /broad/dunnlab/BLM/conda_libraries/pvactools_conda

pvacseq run \
/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/ct2a_wes/annotated.bam_readcount_mismatch_t+exp.vep.vcf \
CT2A \
H-2-Kb,H-2-Db \
MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC PickPocket SMM SMMPMBEC SMMalign \
/broad/dunnlab/BLM/pvac/mao_pvac_run/CT2A/output \
-e1 8,9,10 -e2 15 \
--n-threads 3

Input files

No response

Log output

Parsing prediction file for Allele H-2-Kb and Epitope Length 8 - Entries 401-600 - Completed Parsing binding predictions for Allele H-2-Kb and Epitope Length 9 - Entries 401-600 Parsing prediction file for Allele H-2-Kb and Epitope Length 9 - Entries 401-600 Traceback (most recent call last): File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/bin/pvacseq", line 8, in sys.exit(main()) File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/tools/pvacseq/main.py", line 116, in main args[0].func.main(args[1]) File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/tools/pvacseq/run.py", line 133, in main pipeline.execute() File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/lib/pipeline.py", line 434, in execute split_parsed_output_files = self.parse_outputs(chunks) File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/lib/pipeline.py", line 395, in parse_outputs parser.execute() File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/lib/output_parser.py", line 438, in execute iedb_results = self.process_input_iedb_file(tsv_entries) File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/lib/output_parser.py", line 362, in process_input_iedb_file iedb_results = self.parse_iedb_file(tsv_entries) File "/broad/dunnlab/BLM/conda_libraries/pvactools_conda/lib/python3.6/site-packages/pvactools/lib/output_parser.py", line 584, in parse_iedb_file if protein_identifiers_from_label[protein_label] is not None: KeyError: 92

Output files

No response

susannasiebert commented 2 months ago

I think you might be running into a problem with the IEDB API where it intermittently doesn't return results matching the query. Can you please try installing IEDB standalone and using the --iedb-install-directory parameter? IEDB also comes pre-installed in our Docker container if that is more convenient.

Alternatively, in recent pVACtools versions, we have implemented retries to the IEDB API for cases like this. You could try a new version of pVACtools (4.1.1 is the current version) to see if that fixes things. In either case you would need to delete the output directory before retrying.

brycemash commented 2 months ago

This fixed it thank you!