psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

Exception: couldn't find calcd line.... #187

Closed krdav closed 8 years ago

krdav commented 8 years ago

So I was running some VH sequences for testing both the annotation and the partitioning. With 500 sequences both annotation and partitioning works just fine. With 5000 sequences the annotation still works fine but the partitioning crashes with the following error:

[ root@1fe9ccb7e30e:/partis {master *} ]$ ./bin/partis partition --seqfile /host/Users/krdav/Dropbox/immuno/data/289_290_rep/some_seqs2.fa --outfname /host/Users/krdav/Dropbox/immuno/data/289_290_rep/some_seqs2-viterbi.csv
  note:! running on 5000 sequences spread over 1 processes. This will be kinda slow, so it might be a good idea to set --n-procs N to the number of processors on your local machine, or look into non-local parallelization with --slurm.

  note: using default --parameter-dir '_output/some_seqs2'
partitioning
smith-waterman
    rewriting germlines from /partis/data/imgt to /tmp/root/hmms/622518/germline-sets (using 72 genes) 
        processed       remaining      new-indels          rerun: unproductive      no-match      weird-annot.      nonsense-bounds      invalid-codon
          5000             286              0                         228              58               0               0               0             increasing mismatch score (1 --> 2) and rerunning them
           286             223            223                           0               0               0               0               0             rerunning for indels
           223         all done
      info for 5000 
        water time: 81.3
hmm
    writing input
    running 1 procs
        cachefile d.n.e.
      caching all naive sequences
        calcd:   vtb 5000  fwd 0     hfrac 0           merged:  hfrac 0    lratio 0   
        time: bcrham 929.8

      time waiting for bcrham: 932.1
      hmm step time: 933.1
--> 5000 clusters with 1 procs
hmm
    writing input
       naive hfrac bounds: 0.015 0.074   (0.037 mutation in _output/some_seqs2/hmm)
    running 1 procs
Traceback (most recent call last):
  File "./bin/partis", line 342, in 
    args.func(args)
  File "./bin/partis", line 137, in run_partitiondriver
    parter.partition()
  File "/partis/python/partitiondriver.py", line 171, in partition
    cpath = self.run_hmm('forward', self.args.parameter_dir, n_procs=n_procs, cpath=cpath)
  File "/partis/python/partitiondriver.py", line 549, in run_hmm
    self.execute(cmd_str, n_procs)
  File "/partis/python/partitiondriver.py", line 519, in execute
    utils.finish_process(iproc, procs, n_tries, self.subworkdir(iproc, n_procs), get_outfname(iproc), get_cmd_str(iproc), self.bcrham_proc_info[iproc])
  File "/partis/python/utils.py", line 1532, in finish_process
    process_out_err('', '', extra_str='' if len(procs) == 1 else str(iproc), info=info, subworkdir=workdir)
  File "/partis/python/utils.py", line 1576, in process_out_err
    raise Exception('couldn\'t find %s line in:\nout:\n%s\nerr:\n%s' % (header, out, err))
Exception: couldn't find calcd line

 in:
out:
        read 0 cached logprobs and 5000 naive seqs

err:
Killed

The file I ran this on is attached: some_seqs2.fa.zip

krdav commented 8 years ago

Funny. The vsearch partitioning seems to be working fine, though I find it slightly confusing the use "err:" in the screen output when there does not seem to be any error.

  out:
    vsearch v1.1.3_linux_x86_64, 2.0GB RAM, 1 cores
    https://github.com/torognes/vsearch

  err:
    Reading file /tmp/root/hmms/550327/simu.fasta 0% ....
    ....
            .... Writing clusters 100%
    Clusters: 294 Size min 1, max 605, avg 17.0
    Singletons: 90, 1.8% of seqs, 30.6% of clusters

      vsearch/swarm time: 121.7
      total time: 1061.1
psathyrella commented 8 years ago

ah, great, thanks for submitting.

So what the upper exception is saying is that it couldn't parse the info it needs from one of the jobs. So it prints all of the stdout and stderr -- which says the job was killed. Without knowing what kind of system you're on, I can't be sure what killed it, but if you're on a batch system there's a lot of things, maybe it was killed by hand? Maybe it ran out of memory?

Hm, yeah, maybe I should change it to stderr instead of err?

krdav commented 8 years ago

This was run with the docker container on my PC and indeed I think you are right about the memory. I was simply not aware that partis is using so much memory.

Now I will try on a 32 core node with 1TB RAM, then hopefully memory will not be an issue...

krdav commented 8 years ago

Okay, provided enough memory partis runs to completion with no problems to report. So I guess the lesson is not to run jobs with large number of sequences on a PC.