psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 36 forks source link

Mismatch between default and IMGT-aligned databases for macaques #318

Closed scharch closed 2 years ago

scharch commented 2 years ago

Not every gene that is in partis/data/germlines/macaque/ig?/*fasta is also in partis/data/germlines/macaque/imgt-alignments/ig?.fa. This leads to errors like the following:

$> partis annotate --extra-annotation-columns cdr3_seqs:invalid:in_frames:stops --infname seq.fa --outfname rearrangements.tsv --presto-output --aligned-germline-fname /nethome/schrammca/programs/partis/data/germlines/macaque/imgt-alignments/igk.fa --species macaque --locus igk

Traceback (most recent call last):
  File "/nethome/schrammca/programs/partis/bin/partis", line 1509, in <module>
    args.func(args)
  File "/nethome/schrammca/programs/partis/bin/partis", line 315, in run_partitiondriver
    parter.run(actions)
  File "/nethome/schrammca/programs/partis/python/partitiondriver.py", line 127, in run
    self.action_fcns[tmpaction]()
  File "/nethome/schrammca/programs/partis/python/partitiondriver.py", line 308, in annotate
    self.run_waterer(look_for_cachefile=not self.args.write_sw_cachefile, write_cachefile=self.args.write_sw_cachefile, count_parameters=self.args.count_parameters)
  File "/nethome/schrammca/programs/partis/python/partitiondriver.py", line 217, in run_waterer
    waterer.read_cachefile(cachefname)
  File "/nethome/schrammca/programs/partis/python/waterer.py", line 162, in read_cachefile
    utils.add_implicit_info(self.glfo, line, aligned_gl_seqs=self.aligned_gl_seqs)
  File "/nethome/schrammca/programs/partis/python/utils.py", line 2993, in add_implicit_info
    add_alignments(glfo, aligned_gl_seqs, line)
  File "/nethome/schrammca/programs/partis/python/utils.py", line 5475, in add_alignments
    add_regional_alignments(glfo, aligned_gl_seqs, line, region, debug)
  File "/nethome/schrammca/programs/partis/python/utils.py", line 5404, in add_regional_alignments
    aligned_gl_seq = aligned_gl_seqs[region][line[region + '_gene']]
KeyError: u'IGKV3-ADN*01'

...with similar KeyErrors for IgH and IgL, as well.