griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

Error in Calculating Reference Proteome Similarity: Exception: Unable to find full_peptide for variant #1006

Closed WLYYYYY closed 1 year ago

WLYYYYY commented 1 year ago

Installation Type

Docker

pVACtools Version / Docker Image

griffithlab/pvactools:4.0.1

Python Version

No response

Operating System

No response

Describe the bug

Hello,

I would like to report an error I encountered with the latest version of pVACtools (v4.0.1).

I was using pVACseq to predict MHC class I and class II neoantigens of five samples. Analysis of all the other samples finished successfully, however, one of the samples got an error in the "Calculating Reference Proteome Similarity" step. This appeared to happened after MHC class I binding prediction, because no error message was emitted during that step, and an aggregated report was generated under the "MHC_class_I" folder.

Can anybody help me to solve this? Many thanks!

How to reproduce this bug

docker run \
            -v $somatic_vcf_dir:/somatic_vcf \
            -v $phased_vcf_dir:/phased_vcf \
            -v $output_dir:/outputs \
            -v $pep_ref_dir:/pep_ref \
            --name $container_name \
            -dt --rm griffithlab/pvactools:4.0.1

sample_hla=$( cat "${output_dir}/${tumor_name}_hla_alleles.txt" )

docker exec $container_name pvacseq run \
                /somatic_vcf/$somatic_vcf \
                $tumor_name \
                $sample_hla \
                all \
                /outputs/$tumor_name \
                --normal-sample-name $normal_name \
                --iedb-install-directory /opt/iedb \
                --n-threads $nThread \
                --run-reference-proteome-similarity \
                --peptide-fasta /pep_ref/$pep_ref \
                --phased-proximal-variants-vcf /phased_vcf/$phased_vcf

Input files

The input VCF files and the hla-typing result of that problematic sample were uploaded to my Google Drive. All the VCF files were annotated with Ensembl VEP v109.

The reference peptide fasta I used was "Homo_sapiens.GRCh38.pep.all.fa.gz", which was downloaded from the Ensembl FTP site.

Log output

Combining Parsed Prediction Files Completed Creating aggregated report Tumor clonal VAF estimated as 0.5 (estimated from Tumor DNA VAF data). Assuming variants with VAF < 0.25 are subclonal Completed Calculating Manufacturability Metrics Completed Running Binding Filters Completed Running Coverage Filters Completed Running Transcript Support Level Filter Complete Running Top Score Filter Completed Calculating Reference Proteome Similarity Traceback (most recent call last): File "/usr/local/bin/pvacseq", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/usr/local/lib/python3.7/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/usr/local/lib/python3.7/site-packages/pvactools/lib/pipeline.py", line 484, in execute PostProcessor(**post_processing_params).execute() File "/usr/local/lib/python3.7/site-packages/pvactools/lib/post_processor.py", line 65, in execute self.calculate_reference_proteome_similarity() File "/usr/local/lib/python3.7/site-packages/pvactools/lib/post_processor.py", line 247, in calculate_reference_proteome_similarity aggregate_metrics_file=aggregate_metrics_file, File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 597, in execute unique_peptides = pymp.shared.list(self._get_unique_peptides(mt_records_dict, wt_records_dict)) File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 577, in _get_unique_peptides peptide, full_peptide = self._get_peptide(line, mt_records_dict, wt_records_dict) File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 295, in _get_peptide (full_peptide, wt_peptide, variant_type, mt_amino_acids, wt_amino_acids) = self._get_full_peptide(line, mt_records_dict, wt_records_dict) File "/usr/local/lib/python3.7/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 282, in _get_full_peptide raise Exception("Unable to find full_peptide for variant {}".format(line['ID'])) Exception: Unable to find full_peptide for variant chr14-22518496-22518497-T-C

Output files

No response

susannasiebert commented 1 year ago

Hi @WLYYYYY, thank you for your interest in pVACtools and I apologize for the problem you're running into. I'm at a conference this week but will look into this issue when I'm back at work next week.

susannasiebert commented 1 year ago

Unfortunately, I'm running into problems with the input files provided:

Can you please upload the VEP-annotated files?

WLYYYYY commented 1 year ago

Sorry, my bad, I provided the wrong files... Please uses the updated files instead: somatic.annotated.vcf.gz somatic.annotated.vcf.gz.tbi phased.annotated.vcf.gz phased.annotated.vcf.gz.tbi

Thank you very much!

susannasiebert commented 1 year ago

This error should be fixed in version 4.0.2. I'm closing this issue but please feel free to reopen it, should you still run into problems.