psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 34 forks source link

template not in codon_info error? #256

Closed scharch closed 6 years ago

scharch commented 6 years ago
schrammca$ partis annotate --infname output/sequences/nucleotide/BFI-0000243_goodVJ.fa --outfname output/partis.yaml --parameter-dir partis --n-procs 2
  parameter dir 'partis' does not exist, so caching a new set of parameters before running action 'annotate'
caching parameters
  vsearch: 18669 / 18669 v annotations (0 failed) with 140 v genes in 4.3 sec
    keeping 36 / 204 v genes
smith-waterman  (new-allele fitting)
  vsearch: 18669 / 18669 v annotations (0 failed) with 36 v genes in 4.7 sec
    running 2 procs for 18669 seqs
    running 4 procs for 124 seqs
      info for 18669 / 18669 = 1.000   (0 failed)
      kept 147 (0.008) unproductive
      removed 3446 / 18669 = 0.18 duplicate sequences after trimming framework insertions (leaving 15223)
    water time: 89.5  (ig-sw 1.2  processing 0.5)
    adding new allele to glfo: 
      template CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG   hv1-801
           new CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG   hv1-801+A120C
    adding new allele to glfo: 
      template CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGCAATAAATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA   hv3-30-301
           new CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA   hv3-3003
    removing template gene hv3-30-301
Traceback (most recent call last):
  File "/home/schrammca/bin/partis", line 403, in <module>
    args.func(args)
  File "/home/schrammca/bin/partis", line 192, in run_partitiondriver
    parter.run(actions)
  File "/home/schrammca/partis/partis/python/partitiondriver.py", line 104, in run
    self.action_fcns[tmpaction]()
  File "/home/schrammca/partis/partis/python/partitiondriver.py", line 249, in cache_parameters
    glutils.add_new_alleles(self.glfo, new_allele_info, debug=True, simglfo=self.simglfo)  # <remove_template_genes> stuff is handled in <new_allele_info>
  File "/home/schrammca/partis/partis/python/glutils.py", line 813, in add_new_alleles
    add_new_allele(glfo, newfo, remove_template_genes=remove_template_genes, use_template_for_codon_info=use_template_for_codon_info, simglfo=simglfo, debug=debug)
  File "/home/schrammca/partis/partis/python/glutils.py", line 845, in add_new_allele
    raise Exception('template gene %s not found in codon info' % newfo['template-gene'])
Exception: template gene IGHV3-30-3*01 not found in codon info
psathyrella commented 6 years ago

arg, thanks.

This is ringing a bell, I think it happened once before? But then I'm not sure why it can still happen, since I would've fixed it. The deal is it usually gets the codon position for the new allele from its template gene, but for homozygous novel alleles the template gene gets removed, so I should be storing the codon position directly in the new allele's info before removing. But it apparently doesn't happen every time we remove the template, since, well, it works fine most of the time.

In any case, I think this should fix it -- I don't have an easy way to test it, since it's an uncommon situation, but if it's easy for you to switch to dev for a moment and see that'd be great.

psathyrella commented 6 years ago

oh, I remember. It happens only when there's two novel alleles from the same template gene, and the first of those thinks that the template should be removed, then it crashes when it tries to handle the second one since the template is gone. That ^ commit should indeed fix that.

scharch commented 6 years ago

Yep, that worked, thanks.