psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 34 forks source link

qr_gap_seq length not the same as qrbound length in v region indelfo #262

Closed darth-donut closed 6 years ago

darth-donut commented 6 years ago

Hi, I'm running partis on a sample size of approx 18k sequences, and I've encountered this exception:

Traceback (most recent call last):
  File "/home/jiahong/Documents/pipelines/partis/bin/partis", line 403, in <module>
    args.func(args)
  File "/home/jiahong/Documents/pipelines/partis/bin/partis", line 192, in run_partitiondriver
    parter.run(actions)
  File "/home/jiahong/Documents/pipelines/partis/python/partitiondriver.py", line 104, in run
    self.action_fcns[tmpaction]()
  File "/home/jiahong/Documents/pipelines/partis/python/partitiondriver.py", line 242, in cache_parameters
    self.run_waterer(dbg_str='new-allele fitting')
  File "/home/jiahong/Documents/pipelines/partis/python/partitiondriver.py", line 190, in run_waterer
    waterer.run(cachefname if write_cachefile else None)
  File "/home/jiahong/Documents/pipelines/partis/python/waterer.py", line 108, in run
    self.read_output(base_outfname, len(mismatches))
  File "/home/jiahong/Documents/pipelines/partis/python/waterer.py", line 482, in read_output
    self.summarize_query(qinfo, queries_to_rerun)  # returns before adding to <self.info> if it thinks we should rerun the query
  File "/home/jiahong/Documents/pipelines/partis/python/waterer.py", line 911, in summarize_query
    indelfo = self.combine_indels(qinfo, best)  # the next time through, when we're writing ig-sw input, we look to see if each query is in <self.info['indels']>, and if it is we pass ig-sw the indel-reversed sequence, rather than the <input_info> sequence
  File "/home/jiahong/Documents/pipelines/partis/python/waterer.py", line 1428, in combine_indels
    return indelutils.combine_indels(regional_indelfos, full_qrseq, qrbounds, uid=qinfo['name'])
  File "/home/jiahong/Documents/pipelines/partis/python/indelutils.py", line 618, in combine_indels
    raise Exception('%sqr_gap_seq non-gap length %d not the same as qrbound length %d in %s region indelfo' % ('%s: ' % uid if uid is not None else '', utils.non_gap_len(rfo['qr_gap_seq']), qrbounds[region][1] - qrbounds[region][0], region))
Exception: read210308: qr_gap_seq non-gap length 293 not the same as qrbound length 290 in v region indelfo

I have a similar issue when running this again on a different dataset, with a different read.

Here's the read that caused the exception:

>read210308
ATGGACTGGACCTGGAGGTTCCTCTTTGTGGTGGCAGCAGCTACAGGTAAGGGGCTTCCT
AGTCCTAAGGTGAGGAAGGGATCCTTGTTTAGTTAAAGAGGATTTTATTCACCCCTGTGT
CCTCTCCACAGGTGTCCAGTCCCAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAA
GCCTGGGTCCTCGGTGAAGGTCTCCTGCGAGGCTTCTGGAGGCACCTTCAGCAGCTATGC
TATCAGCTGGGTGCGACAGGCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATTATCCCT
ATCTTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGA
CAAATCCACGAGCACAGCCTGCATGGAGCTGAGCATCCTGAGATCTGAGGACACGGCCGT
GTATTACTGTGCGAGGAACGAGCTGAAGGAGATTGACTACGGGCCAGGAACCCTGGTCAC
CGTCTCCTCACTTCCACCAAGGGCCCATCGGTCTTCCCCC

And here's the command I used to run partis:

partis annotate --infname PCR2.fasta --outfname PCR2.yaml --extra-annotation-columns invalid:cdr3_seqs --n-procs 10

If it helps, the other read was:

Exception: read234866: qr_gap_seq non-gap length 296 not the same as qrbound length 294 in v region indelfo
>read234866
ATGGACACACTTTGCTCCACGCTCCTGCTGCTGACCATCCCTTCTGTGAGTGCTGTGGTC
AGGGACTCCTTCACGGGTGAAACATCAGTTTTCTTGTTTGTGGGCTTCATCTTTCTTATG
CTTTCTCCACAGGGGTCTTGTCCCAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGA
AACCCACACAGACCCTCACGCTGACTTGCACCTTCTCTGGGTTCTCACTCGGCACTAGTG
GAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCA
TTTATTGGAATGATTATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCA
AGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCTGTGGACACAGC
CACATATTACTGTGCACACGAAGACTATGATAGTAGTGGTATACTACTCTACTACATGGA
CGTCTGGGGCAAAGGACCACGATCATCGTCTCCTCACCTTCACACAGAGCCCATCCGTCT
TCCCCT

(also sample size of ~18k)

I'm running partis version 0.14.0 and mafft v7.402.

psathyrella commented 6 years ago

arg, sorry about that. That's happened a couple of times, and has been hard to track down since it usually depends on the whole sample. But in this case, it seems I finally stamped the bug out a week or two ago, since I get that ^ error running on your sequence on master, but not on dev. So I'll merge dev into master now, and you can either wait for that or just pull from dev now.

psathyrella commented 6 years ago

ok, pushed https://github.com/psathyrella/partis/commits/master

darth-donut commented 6 years ago

That solved it, thanks!