roryk / ipython-cluster-helper

Tool to easily start up an IPython cluster on different schedulers.
148 stars 23 forks source link

TERM_OWNER: job killed by owner Exited with signal termination: 14 #63

Closed simoncchu closed 5 years ago

simoncchu commented 5 years ago

I got an error like:

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/sh

#BSUB -q research-rh74
#BSUB -J bcbio-e[1-100]
#BSUB -oo bcbio-ipengine.bsub.%%J
#BSUB -n 8
#BSUB -R "span[hosts=1]"
#BSUB -R "rusage[mem=10240]"
#BSUB -W 12:00
#BSUB -M 80G

export IPYTHONDIR=/homes/simonchu/.ipython/profile_115c2832-c5f5-11e9-a58c-af5a86390335
/homes/simonchu/anaconda2/bin/python -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from ipyparallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=960 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=120 --profile-dir="/homes/simonchu/.ipython/profile_115c2832-c5f5-11e9-a58c-af5a86390335" --cluster-id="d9965f46-5f50-4b14-8e0e-091755ed6c48"

------------------------------------------------------------

TERM_OWNER: job killed by owner. Exited with signal termination: 14.

Any possible reason for this? Thank you.

roryk commented 5 years ago

Hi Simon,

If this is part of a bcbio job, could you post the log in log/bcbio-nextgen-debug.log?

simoncchu commented 5 years ago

Hi @roryk, actually I got this error when using ipython-cluster-helper.

roryk commented 5 years ago

Oh! There should be some other logs, this is just the engine job, but there should be a main job that you were running and a controller job which might have some more helpful logs.

simoncchu commented 5 years ago

This is one of the error file:

Traceback (most recent call last):
  File "/homes/simonchu/simonchu_nobackup/xTEA_long_reads/l_main.py", line 181, in <module>
    lcaller.call_MEIs_for_sites(sf_bam_list, sf_merged, sf_ref, l_extd_len, sf_out_fa, sf_out_sites, l_cluster_info)
  File "/gpfs/nobackup/icortes/icortes/simonchu/xTEA_long_reads/l_MEI_caller.py", line 191, in call_MEIs_for_sites
    n_core_per_job, n_mem_GB, i_wait_mins, s_max_time, s_total_mem)
  File "/gpfs/nobackup/icortes/icortes/simonchu/xTEA_long_reads/l_MEI_caller.py", line 305, in asm_seq_for_sites_in_parallel_on_cluster
    view.map(self._asm_collected_reads_one_site, l_sites)
  File "</homes/simonchu/anaconda2/lib/python2.7/site-packages/decorator.pyc:decorator-gen-137>", line 2, in map
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/view.py", line 52, in sync_results
    ret = f(self, *args, **kwargs)
  File "</homes/simonchu/anaconda2/lib/python2.7/site-packages/decorator.pyc:decorator-gen-136>", line 2, in map
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/view.py", line 37, in save_ids
    ret = f(self, *args, **kwargs)
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/view.py", line 1114, in map
    return pf.map(*sequences)
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 299, in map
    return self(*sequences, __ipp_mapping=True)
  File "</homes/simonchu/anaconda2/lib/python2.7/site-packages/decorator.pyc:decorator-gen-119>", line 2, in __call__
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 80, in sync_view_results
    return f(self, *args, **kwargs)
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 253, in __call__
    pf = PrePickled(self.func)
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/serialize/serialize.py", line 44, in __init__
    self.buffers = serialize_object(obj)
  File "/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/serialize/serialize.py", line 125, in serialize_object
    buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

The other two files, I didn't get any useful information. @roryk

roryk commented 5 years ago

Thanks! That is a more helpful error. There are a couple of things that aren't easily picklable, closures is the one I know about, but there are other ones. It looks like you found another one. Does this https://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-multiprocessing-pool-map help at all?

simoncchu commented 5 years ago

Oh I get your point. Let me try another way and let you know how it goes. Thank you!

roryk commented 5 years ago

Great!

simoncchu commented 5 years ago

The original issue is because I am trying to call view.map(self.collected_reads_one_site, l_sites) where self.collected_reads_one_site is a function within a class that is non-pickle. I add some code :

import copy_reg
import types

def _reduce_method(m):
    if m.im_self is None:
        return getattr, (m.im_class, m.im_func.func_name)
    else:
        return getattr, (m.im_self, m.im_func.func_name)
copy_reg.pickle(types.MethodType, _reduce_method)

It seems the problem solved. But another issue showed up:

ipyparallel.error.CompositeError: one or more exceptions from call to method: _collected_reads_one_site
[71:apply]: ImportError: No module named l_reads_collection

Here, l_reads_collection.py is the script where saved the class. Any suggestion @roryk ?

Within l_reads_collection.py, I import cluster by from cluster_helper.cluster import cluster_view.

More error information:

ImportErrorTraceback (most recent call last)
/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/serialize/serialize.pyc in unpack_apply_message(bufs, g, copy)
    203     args = []
    204     for i in range(info['nargs']):
--> 205         arg, arg_bufs = deserialize_object(arg_bufs, g)
    206         args.append(arg)
    207     args = tuple(args)

/homes/simonchu/anaconda2/lib/python2.7/site-packages/ipyparallel/serialize/serialize.pyc in deserialize_object(buffers, g)
    143     bufs = list(buffers)
    144     pobj = buffer_to_bytes_py2(bufs.pop(0))
--> 145     canned = pickle.loads(pobj)
    146     if istype(canned, sequence_types) and len(canned) < MAX_ITEMS:
    147         for c in canned:

ImportError: No module named l_reads_collection22019-08-25 19:38:16.013 [IPEngineApp] CRITICAL | Engine Interrupted, shutting down...
roryk commented 5 years ago

Hi Simon,

The example.py example in the example directory might be helpful to look at. For functions where the function is going to need a particular module loaded, decorating that function with @require will make sure the function loads the module before executing the code.

Another way to capture what you are trying to do is to write a function wrapper around the call to your class method, and then pass the whole object in as an argument. The ipython documentation might have some other suggestions for getting around these types of issues as well.