psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

Error with Partis when running Sumrep #282

Closed Pezhvuk closed 5 years ago

Pezhvuk commented 5 years ago

Hi,

I've been using Sumrep with the Partis backend on three different datasets, and I have had no problems with two of them. However, so far, the files I have tried from the other one (Gupta et al 2017) return the following error:

Traceback (most recent call last):
  File "/d/as7/s/partis/bin/partis", line 450, in <module>
    args.func(args)
  File "/d/as7/s/partis/bin/partis", line 214, in run_partitiondriver
    parter.run(actions)
  File "/d/as7/s/partis/python/partitiondriver.py", line 108, in run
    self.action_fcns[tmpaction]()
  File "/d/as7/s/partis/python/partitiondriver.py", line 272, in cache_parameters
    _, annotations, hmm_failures = self.run_hmm('viterbi', parameter_in_dir=self.sw_param_dir, parameter_out_dir=self.hmm_param_dir, count_parameters=True)
  File "/d/as7/s/partis/python/partitiondriver.py", line 1058, in run_hmm
    self.execute(cmd_str, n_procs)
  File "/d/as7/s/partis/python/partitiondriver.py", line 1029, in execute
    utils.run_cmds(cmdfos, batch_system=self.args.batch_system, batch_options=self.args.batch_options, batch_config_fname=self.args.batch_config_fname, debug='print' if self.args.debug else None)
  File "/d/as7/s/partis/python/utils.py", line 2581, in run_cmds
    finish_process(iproc, procs, n_tries, cmdfos[iproc], n_max_tries, dbgfo=cmdfos[iproc]['dbgfo'], batch_system=batch_system, batch_options=batch_options, debug=debug, ignore_stderr=ignore_stderr, clean_on_success=clean_on_success)
  File "/d/as7/s/partis/python/utils.py", line 2674, in finish_process
    raise Exception(failstr)
Exception: exceeded max number of tries for cmd
    /d/as7/s/partis/packages/ham/bcrham --algorithm viterbi --hmmdir /d/as2/u/mp002/sumrep_project/Gupta/partis/S-GMC_-1h/params/sw/hmms --datadir /tmp/mp002/hmms/303421/germline-sets --infile /tmp/mp002/hmms/303421/hmm-0/hmm_input.csv --outfile /tmp/mp002/hmms/303421/hmm-0/hmm_output.csv --locus igh --random-seed 1554425988 --only-cache-new-vals --ambig-base N
look for output in /tmp/mp002/hmms/303421/hmm-0 and /tmp/mp002/hmms/303421/hmm-0

FYI I updated my Partis last week, and the Sumrep is also up to date. Though, I ran Sumrep with Partis on all the files in the Gupta study with no problem, last year (around March). Something has broken in Partis/sumrep updates? Just in case, I have uploaded the fasta files.

This issue is also posted on the Sumrep page.

S-GMC_-1h.txt S-FV_-1h.txt

Cheers, Pejvak.

psathyrella commented 5 years ago

So that looks like the a bcrham (c++) subprocess that was being run by the python main process crashed. The second to last line is telling you the exact command that was run, so if you run that it will probably give the same error. You can also look at the out/err files where it says to in the last line

look for output in /tmp/mp002/hmms/303421/hmm-0 and /tmp/mp002/hmms/303421/hmm-0

and that should give us some clues what's going on.

Pezhvuk commented 5 years ago

Well, I tried both running the command, which gave me a weird outcome, and the aforementioned tmp directories didn't seem to exist either. Turned out, both issues were related due to being/handling directories in server-specific environment, and not the shared environment. Anyway, the problem was caused by the absence of libyaml-cpp.so.0.5 package, and ultimately resolved by installing it.

So, it maybe useful if the installation of libyaml is either enforced, or the absence of it throws an exception/error, during the Partis installation.

psathyrella commented 5 years ago

huh, that's weird. Usually it does fail during install if yaml-cpp dev is missing, for instance like this (although that was only failing during linking, while if it wasn't installed entirely, it would break also during compilation when it couldn't find the headers.

Which leads me to believe that the issue is compiling and running on different machines, which I think is what this refers to?

being/handling directories in server-specific environment

i.e. the issue is that it was compiled on a machine that has all the dependencies, but the resulting binary was on a networked file system, so was then used by other machines that didn't have the libs? At least, this failure chain has certainly happened to me both with partis and other packages. If this is the case, I'm not sure that there's a good way to check for all compilation dependencies at run time? I feel like that is just a general feature of compilation, that you have to be careful to compile and run on very similar systems, since at least as far as I'm aware there isn't a really good way to check.

Also, what was the weird outcome when you re-ran the command? Was it the same error? And as far as I'm aware, when the exception you're getting ^ gets raised it does not clean the working directory, so the tmp files should be there, unless maybe it's a different machine, since it looks like your workdir is in /tmp, but it sounds like you're running on multiple machines. Or if the machine was rebooted, /tmp gets cleared.