griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
144 stars 59 forks source link

pVACFuse: KeyError: '25.GCN1-MSI1.ENST00000300648.7-ENST00000257552.7.inframe_fusion.32' #1125

Closed MaxMichaeler closed 3 months ago

MaxMichaeler commented 4 months ago

Installation Type

Standalone

pVACtools Version / Docker Image

4.2.1

Python Version

3.9.18

Operating System

Linux

Describe the bug

I'm trying to run pVACFuse in a Nextflow pipeline using a Singularity image and each time I run it I keep getting the error:

Traceback (most recent call last): File "/opt/conda/bin/pvacfuse", line 8, in <module> sys.exit(main()) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/main.py", line 108, in main args[0].func.main(args[1]) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/run.py", line 245, in main create_net_class_report(output_files, all_epitopes_file, filtered_file, args, run_arguments) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/run.py", line 42, in create_net_class_report PostProcessor(**post_processing_params).execute() File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/post_processor.py", line 65, in execute self.calculate_reference_proteome_similarity() File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/post_processor.py", line 252, in calculate_reference_proteome_similarity CalculateReferenceProteomeSimilarity( File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 595, in execute unique_peptides = pymp.shared.list(self._get_unique_peptides(mt_records_dict, wt_records_dict)) File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 575, in _get_unique_peptides peptide, full_peptide = self._get_peptide(line, mt_records_dict, wt_records_dict) File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 283, in _get_peptide peptide = mt_records_dict[line['ID']] KeyError: '25.GCN1-MSI1.ENST00000300648.7-ENST00000257552.7.inframe_fusion.32'

I've tried looking at similar issues, but they all seem to relate to some bug. My input is an Arriba .tsv file.

How to reproduce this bug

export BLASTDB=/opt/resourceDir/blast_db
export MHCFLURRY_DATA_DIR=/opt/pvacfuse/mhcflurry_data
blastp_path=`which blastp`

pvacfuse run \
    -e1 8,9,10,11 \
    -e2 15,16,17,18,19,20,21,22,23,24,25 \
    --iedb-install-directory /opt/pvacfuse/iedb_data \
    --run-reference-proteome-similarity \
    -t 6 \
    --blastp-path $blastp_path \
    --blastp-db refseq_select_prot \
    -s 100 \
    -d 200 \
     \
    SRR1107833.fusions.tsv \
    SRR1107833 \
    DQB1*06:02 \
    DeepImmuno MHCflurry MHCflurryEL MHCnuggetsI MHCnuggetsII NNalign NetMHC NetMHCIIpan NetMHCIIpanEL NetMHCpan SMM SMMPMBEC \
    ./

Input files

Github doesn't allow .tsv files, but this should be run as a .tsv and not .txt

SRR1107833.fusions.txt

Log output

All prerequisites found! Copying the standalone-specific netMHCcons template into place IEDB MHC class I binding prediction tools successfully installed! Use the command 'python src/predict_binding.py' to get started /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5 No MHC class I alleles chosen. Skipping MHC class II predictions. Executing MHC Class II predictions Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 15 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/15/tmp/SRR1107833.MHCnuggetsII.DQB106:02.15.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 15 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/15/tmp/SRR1107833.MHCnuggetsII.DQB106:02.15.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 15 - Entries 1-97 Parsing prediction file for Allele DQB106:02 and Epitope Length 15 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 16 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/16/tmp/SRR1107833.MHCnuggetsII.DQB106:02.16.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 16 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/16/tmp/SRR1107833.MHCnuggetsII.DQB106:02.16.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 16 - Entries 1-97 Parsing prediction file for Allele DQB106:02 and Epitope Length 16 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 17 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/17/tmp/SRR1107833.MHCnuggetsII.DQB106:02.17.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 17 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/17/tmp/SRR1107833.MHCnuggetsII.DQB106:02.17.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 17 - Entries 1-97 Parsing prediction file for Allele DQB106:02 and Epitope Length 17 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 18 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/18/tmp/SRR1107833.MHCnuggetsII.DQB106:02.18.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 18 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/18/tmp/SRR1107833.MHCnuggetsII.DQB106:02.18.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 18 - Entries 1-97 Parsing prediction file for Allele DQB106:02 and Epitope Length 18 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-98 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 19 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/19/tmp/SRR1107833.MHCnuggetsII.DQB106:02.19.tsv_1-98 Making binding predictions on Allele DQB106:02 and Epitope Length 19 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/19/tmp/SRR1107833.MHCnuggetsII.DQB106:02.19.tsv_1-98 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 19 - Entries 1-98 Parsing prediction file for Allele DQB106:02 and Epitope Length 19 - Entries 1-98 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-98 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 20 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/20/tmp/SRR1107833.MHCnuggetsII.DQB106:02.20.tsv_1-98 Making binding predictions on Allele DQB106:02 and Epitope Length 20 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/20/tmp/SRR1107833.MHCnuggetsII.DQB106:02.20.tsv_1-98 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 20 - Entries 1-98 Parsing prediction file for Allele DQB106:02 and Epitope Length 20 - Entries 1-98 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-98 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 21 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/21/tmp/SRR1107833.MHCnuggetsII.DQB106:02.21.tsv_1-98 Making binding predictions on Allele DQB106:02 and Epitope Length 21 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/21/tmp/SRR1107833.MHCnuggetsII.DQB106:02.21.tsv_1-98 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 21 - Entries 1-98 Parsing prediction file for Allele DQB106:02 and Epitope Length 21 - Entries 1-98 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-98 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 22 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/22/tmp/SRR1107833.MHCnuggetsII.DQB106:02.22.tsv_1-98 Making binding predictions on Allele DQB106:02 and Epitope Length 22 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/22/tmp/SRR1107833.MHCnuggetsII.DQB106:02.22.tsv_1-98 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 22 - Entries 1-98 Parsing prediction file for Allele DQB106:02 and Epitope Length 22 - Entries 1-98 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-98 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 23 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/23/tmp/SRR1107833.MHCnuggetsII.DQB106:02.23.tsv_1-98 Making binding predictions on Allele DQB106:02 and Epitope Length 23 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/23/tmp/SRR1107833.MHCnuggetsII.DQB106:02.23.tsv_1-98 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 23 - Entries 1-98 Parsing prediction file for Allele DQB106:02 and Epitope Length 23 - Entries 1-98 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 24 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/24/tmp/SRR1107833.MHCnuggetsII.DQB106:02.24.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 24 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/24/tmp/SRR1107833.MHCnuggetsII.DQB106:02.24.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 24 - Entries 1-97 Parsing prediction file for Allele DQB106:02 and Epitope Length 24 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Converting Fusion file to TSV Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting FASTA into smaller chunks Splitting FASTA into smaller chunks - Entries 1-97 Completed Allele DQB106:02 not valid for Method NNalign. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpan. Skipping. Allele DQB106:02 not valid for Method NetMHCIIpanEL. Skipping. Making binding predictions on Allele DQB106:02 and Epitope Length 25 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/25/tmp/SRR1107833.MHCnuggetsII.DQB106:02.25.tsv_1-97 Making binding predictions on Allele DQB106:02 and Epitope Length 25 with Method MHCnuggetsII - File /data/scratch/work/8c/7131152b3ed15e50a0d492e43f39b5/MHC_Class_II/25/tmp/SRR1107833.MHCnuggetsII.DQB106:02.25.tsv_1-97 - Completed Parsing prediction file for Allele DQB106:02 and Epitope Length 25 - Entries 1-97 Parsing prediction file for Allele DQB1*06:02 and Epitope Length 25 - Entries 1-97 - Completed Combining Parsed Prediction Files Completed Creating aggregated report Completed Calculating Manufacturability Metrics Completed Running Binding Filters Completed Running Coverage Filters Completed Running Top Score Filter Completed Calculating Reference Proteome Similarity Traceback (most recent call last): File "/opt/conda/bin/pvacfuse", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/main.py", line 108, in main args[0].func.main(args[1]) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/run.py", line 245, in main create_net_class_report(output_files, all_epitopes_file, filtered_file, args, run_arguments) File "/opt/conda/lib/python3.9/site-packages/pvactools/tools/pvacfuse/run.py", line 42, in create_net_class_report PostProcessor(**post_processing_params).execute() File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/post_processor.py", line 65, in execute self.calculate_reference_proteome_similarity() File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/post_processor.py", line 252, in calculate_reference_proteome_similarity CalculateReferenceProteomeSimilarity( File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 595, in execute unique_peptides = pymp.shared.list(self._get_unique_peptides(mt_records_dict, wt_records_dict)) File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 575, in _get_unique_peptides peptide, full_peptide = self._get_peptide(line, mt_records_dict, wt_records_dict) File "/opt/conda/lib/python3.9/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 283, in _get_peptide peptide = mt_records_dict[line['ID']] KeyError: '25.GCN1-MSI1.ENST00000300648.7-ENST00000257552.7.inframe_fusion.32'

Output files

No response

susannasiebert commented 3 months ago

This issue should be fixed in version 4.3.0. Please give it a try and reopen this issue if you're still getting this error in version 4.3.0. You will need to run from scratch in order for the fix to take effect.