openvax / vaxrank

Ranked vaccine peptides for personalized cancer immunotherapy
Apache License 2.0
53 stars 21 forks source link

AssertionError deep in the stack #131

Closed julia326 closed 7 years ago

julia326 commented 7 years ago

Possibly relevant logs (maybe problematic variant?):

2017-06-02 18:06:56,992 - vaxrank.core_logic:55 - INFO - Mutant protein fragment for Variant(contig='12', start=88271648, ref='T', alt='C', reference_name='GRCm38'): MutantProteinFragment(variant=Variant(contig='12', start=88271648, ref='T', alt='C', reference_name='GRCm38'), gene_name=u'Gm5662', amino_acids='', mutant_amino_acid_start_offset=15, mutant_amino_acid_end_offset=16, supporting_reference_transcripts=[Transcript(transcript_id=ENSMUST00000101168, name=Gm5662-001, gene_id=ENSMUSG00000079029, gene_name=Gm5662, biotype=protein_coding, location=12:88270640-88274500)], n_overlapping_reads=3, n_alt_reads=3, n_ref_reads=0, n_alt_reads_supporting_protein_sequence=2)
Traceback (most recent call last):
  File "/home/julia/envs/vaxrank/bin/vaxrank", line 11, in <module>
    sys.exit(main())
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/cli.py", line 285, in main
    data = ranked_variant_list_with_metadata(args)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/cli.py", line 227, in ranked_variant_list_with_metadata
    variant_sequence_assembly=args.variant_sequence_assembly)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/core_logic.py", line 191, in ranked_vaccine_peptides
    min_epitope_score=min_epitope_score)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/core_logic.py", line 164, in generate_vaccine_peptides
    min_epitope_score=min_epitope_score)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/core_logic.py", line 61, in vaccine_peptides_for_variant
    genome=variant.ensembl).values()
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/vaxrank/epitope_prediction.py", line 168, in predict_epitopes
    {protein_fragment.gene_name: protein_fragment.amino_acids})
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/mhctools/base_predictor.py", line 208, in predict_subsequences
    binding_predictions = self.predict_peptides(peptide_list)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/mhctools/base_commandline_predictor.py", line 316, in predict_peptides
    temp_dir_list=dirs)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/mhctools/base_commandline_predictor.py", line 262, in _run_commands_and_collect_predictions
    process_limit=self.process_limit)
  File "/home/julia/envs/vaxrank/local/lib/python2.7/site-packages/mhctools/process_helpers.py", line 114, in run_multiple_commands_redirect_stdout
    assert len(multiple_args_dict) > 0
julia326 commented 7 years ago

Variant sequence:

VariantSequence(prefix='TCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', alt='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', sequence='TCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', reads=frozenset([AlleleRead(prefix='CTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTT', name='D00543:132:CA2WYANXX:7:2301:17630:72894', sequence='CTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTT'), AlleleRead(prefix='CTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', name='D00543:132:CA2WYANXX:7:2301:17630:72894', sequence='CTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG'), AlleleRead(prefix='TTGGAATATAGCATGATTAGATTATGTTAGATATCATCAATGTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGC', name='D00543:132:CA2WYANXX:7:1210:19312:79146', sequence='TTGGAATATAGCATGATTAGATTATGTTAGATATCATCAATGTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGC')]))

Reference context:

ReferenceContext(strand='-', sequence_before_variant_locus='ATCCAGATGAAGCAAGAAGTCTGAAGGCCTATGGAGAACTTCCAGAACATGCCA', sequence_at_variant_locus='A', sequence_after_variant_locus='AATCAATGAAATGGACACATTTGGTGCTGGGGATGATGATGAAATCGTGTTTGA', offset_to_first_complete_codon=2, overlaps_start_codon=False, contains_start_codon=False, contains_five_prime_utr=False, amino_acids_before_variant='PDEARSLKAYGELPEHA', variant=Variant(contig='12', start=88271648, ref='T', alt='C', reference_name='GRCm38'), transcripts=(Transcript(transcript_id=ENSMUST00000101168, name=Gm5662-001, gene_id=ENSMUSG00000079029, gene_name=Gm5662, biotype=protein_coding, location=12:88270640-88274500),))

That variant sequence results in a cdna_prefix of ''CTAAGCAAGAAGTCTGAAGGCCTATGGAGAACTTCCAGAACATGCCA' and is matched to a reading frame with offset 1, having 2 mismatches before the variant (fitting under threshold). The first codon is then TAA (a stop codon), so we get an empty amino acid sequence. We need to figure out the right way to deal with surprise stop codons resulting from some variant sequence in reading frame. @iskandr thoughts?

julia326 commented 7 years ago

Issue moved to hammerlab/isovar #85 via ZenHub