Arriba outputs are allowed to contain ? in the peptide_sequence and fusion_transcript columns. This will trigger an error in the manufacturability calculations as ? is not an amino acid in the dict. See the error message in log output below.
How to reproduce this bug
I've provided an example, I modified a bit the content, but the issues are the same. `?` containing sequences will trigger an error.
Calculating Manufacturability Metrics
Traceback (most recent call last):
File "/Users/boyangzhao/anaconda/envs/bio38/bin/pvacfuse", line 8, in <module>
sys.exit(main())
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/pvactools/tools/pvacfuse/main.py", line 108, in main
args[0].func.main(args[1])
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/pvactools/tools/pvacfuse/run.py", line 212, in main
(input_file, per_epitope_output_dir) = generate_fasta(args, output_dir, epitope_length)
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/pvactools/tools/pvacfuse/run.py", line 82, in generate_fasta
pvactools.tools.pvacfuse.generate_protein_fasta.main(params, save_tsv_file=True, starfusion_file=args.starfusion_file)
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/pvactools/tools/pvacfuse/generate_protein_fasta.py", line 132, in main
CalculateManufacturability(args.output_file, manufacturability_file, 'fasta').execute()
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/pvactools/lib/calculate_manufacturability.py", line 47, in execute
scores = ManufacturabilityScores.from_amino_acids(sequence)
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 148, in from_amino_acids
return cls(*[fn(amino_acids) for fn in scoring_functions])
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 148, in <listcomp>
return cls(*[fn(amino_acids) for fn in scoring_functions])
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 73, in max_7mer_gravy_score
return max_kmer_gravy_score(amino_acids, 7)
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 67, in max_kmer_gravy_score
return max(
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 68, in <genexpr>
gravy_score(amino_acids[i:i + k])
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 56, in gravy_score
total = sum(
File "/Users/boyangzhao/anaconda/envs/bio38/lib/python3.8/site-packages/vaxrank/manufacturability.py", line 57, in <genexpr>
hydropathy_dict[amino_acid] for amino_acid in amino_acids)
KeyError: '?'
Thank you for reporting this error. It should be fixed in version 4.0.2. I'm closing this issue but please feel free to reopen it, should you still run into problems.
Installation Type
Standalone
pVACtools Version / Docker Image
4.0.1
Python Version
3.8
Operating System
No response
Describe the bug
Arriba outputs are allowed to contain
?
in thepeptide_sequence
andfusion_transcript
columns. This will trigger an error in the manufacturability calculations as?
is not an amino acid in the dict. See the error message in log output below.How to reproduce this bug
Input files
Log output
Output files
No response