psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 36 forks source link

Crash running partis partition #302

Closed m-vieira closed 4 years ago

m-vieira commented 4 years ago

I'm running partis partition on several mouse heavy chain BCR datasets. For most datasets it runs successfully (which suggests this is not a problem with my local installation on a cluster), but for some I get the error message below. Any ideas would be appreciated.

Cheers,

Marcos

/project2/cobey/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/project2/cobey/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Error in sys.excepthook:
Traceback (most recent call last):
  File "/home/mvieira/.local/lib/python2.7/site-packages/colored_traceback/colored_traceback.py", line 27, in colorize_traceback
    tb_colored = pygments.highlight(tb_text, lexer, self.formatter)
  File "/home/mvieira/.local/lib/python2.7/site-packages/colored_traceback/colored_traceback.py", line 32, in formatter
    colors = _get_term_color_support()
  File "/home/mvieira/.local/lib/python2.7/site-packages/colored_traceback/colored_traceback.py", line 66, in _get_term_color_support
    curses.setupterm()
_curses.error: setupterm: could not find terminfo database

Original exception was:
Traceback (most recent call last):
  File "/project2/cobey/partis/bin/partis", line 472, in <module>
    args.func(args)
  File "/project2/cobey/partis/bin/partis", line 217, in run_partitiondriver
    parter.run(actions)
  File "/project2/cobey/partis/python/partitiondriver.py", line 120, in run
    self.action_fcns[tmpaction]()
  File "/project2/cobey/partis/python/partitiondriver.py", line 261, in cache_parameters
    alcluster_alleles = alclusterer.get_alleles(self.sw_info, debug=self.args.debug_allele_finding, plotdir=None if self.args.plotdir is None else self.args.plotdir + '/sw/alcluster')
  File "/project2/cobey/partis/python/alleleclusterer.py", line 318, in get_alleles
    clusterfos, msa_info = self.vsearch_cluster_v_seqs(qr_seqs, threshold, debug=debug)
  File "/project2/cobey/partis/python/alleleclusterer.py", line 121, in vsearch_cluster_v_seqs
    print '   vsearch clustering %d %s segments with threshold %.2f (*300 = %d)' % (len(qr_seqs), self.region, threshold, int(threshold * 300))
ValueError: cannot convert float NaN to integer
psathyrella commented 4 years ago

hmm yeah you're right that looks like a problem on our end. It's trying to find the mean j shm frequency for new allele clustering, but ends up taking the mean of an empty list. I'm not sure whether it should be possible or not to get a zero length list there though.

This looks like just stderr, could you also paste stdout?

As to the curses error, I think that is not actually causing a crash, but I'm also not sure what causes it. It sounds familiar but I don't think I've every run into it myself. What os are you on?

m-vieira commented 4 years ago

I'm on a linux cluster. Here's stdout, thanks!

non-human species 'mouse', turning on allele clustering parameter dir '_output/.._data_sequence_data_8-5' does not exist, so caching a new set of parameters before running action 'partition' caching parameters vsearch: 69154 / 69198 v annotations (44 failed) with 140 v genes in 5.5 sec keeping 59 / 261 v genes smith-waterman (new-allele clustering) vsearch: 69153 / 69198 v annotations (45 failed) with 59 v genes in 10.6 sec running 16 procs for 69198 seqs running 18 procs for 354 seqs info for 68910 / 69198 = 0.996 (288 failed) kept 2702 (0.039) unproductive removed 38944 / 68910 = 0.57 duplicate sequences after trimming framework insertions (leaving 29966) water time: 118.1 (ig-sw 22.5 processing 1.1) clustering for new alleles removing 29965/29966 sequences with v_5p or j_3p deletions collapsed 1 input sequences into 0 representatives from 1 clones (removed 1 clones with >= 8 j mutations) mutation among all cluster representatives: v / j = nan / nan = nan

psathyrella commented 4 years ago
removing 29965/29966 sequences with v_5p or j_3p deletions
collapsed 1 input sequences into 0 representatives from 1 clones (removed 1 clones with >= 8 j mutations)
mutation among all cluster representatives: v / j = nan / nan = nan

ok so in the clustering-style allele inference it only looks at full-length sequences, i.e. that have all of V and all of J (since most reads these days are full length anyway, and it would make things way more complicated). On this sample this requirement is removing all but one of the 30k input sequences from consideration for allele inference, and then that one is being removed by another requirement. If you expect that your sequences are not full length, this just means the inferred germline will be less accurate, but is all expected (except the bit where it crashes, which I've fixed here, note that it usually takes an hour or so for docker hub to update).

m-vieira commented 4 years ago

Great. Thank you so much!