kwikteam / global_superclustering

global superclustering
GNU General Public License v2.0
1 stars 0 forks source link

Error running parallel #1

Open shabnamkadir opened 9 years ago

shabnamkadir commented 9 years ago
Traceback (most recent call last):
  File "parallel_global_script.py", line 294, in <module>
    supercluster_results = lbv.map(lambda channel: supercluster_info['kk_sub'][channel].cluster_mask_starts(),full_adjacency.keys())
  File "<string>", line 2, in map
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results
    ret = f(self, *args, **kwargs)
  File "<string>", line 2, in map
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids
    ret = f(self, *args, **kwargs)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 1123, in map
    return pf.map(*sequences)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 271, in map
    ret = self(*sequences)
  File "<string>", line 2, in __call__
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 78, in sync_view_results
    return f(self, *args, **kwargs)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 254, in __call__
    return r.get()
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 104, in get
    raise self._exception
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 139, in wait
    results = error.collect_exceptions(results, self._fname)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 233, in collect_exceptions
    raise e
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 231, in collect_exceptions
    raise CompositeError(msg, elist)
IPython.parallel.error.CompositeError: one or more exceptions from call to method: <lambda>
[0:apply]: NameError: name 'supercluster_info' is not defined
rossant commented 9 years ago

you need to import your functions on all nodes first

shabnamkadir commented 9 years ago

Appears to now run, but gives nonsensical results: ''' INFO klustakwik: Number of spikes in data set: 4001 INFO klustakwik: Number of unique masks in data set: 2575 INFO klustakwik.initial_parameters: full_step_every = 1 INFO klustakwik.initial_parameters: penalty_k = 0.0 INFO klustakwik.initial_parameters: fast_split = False INFO klustakwik.initial_parameters: split_every = 40 INFO klustakwik.initial_parameters: use_noise_cluster = True INFO klustakwik.initial_parameters: subset_break_fraction = 0.01 INFO klustakwik.initial_parameters: mua_point = 2 INFO klustakwik.initial_parameters: max_split_iterations = None INFO klustakwik.initial_parameters: max_possible_clusters = 1000 INFO klustakwik.initial_parameters: max_iterations = 1000 INFO klustakwik.initial_parameters: break_fraction = 0.0 INFO klustakwik.initial_parameters: prior_point = 1 INFO klustakwik.initial_parameters: num_changed_threshold = 0.05 INFO klustakwik.initial_parameters: always_split_bimodal = False INFO klustakwik.initial_parameters: use_mua_cluster = True INFO klustakwik.initial_parameters: split_first = 20 INFO klustakwik.initial_parameters: points_for_cluster_mask = 100 INFO klustakwik.initial_parameters: max_quick_step_candidates_fraction = 0.4 INFO klustakwik.initial_parameters: penalty_k_log_n = 1.0 INFO klustakwik.initial_parameters: max_quick_step_candidates = 100000000 INFO klustakwik.initial_parameters: noise_point = 1 INFO klustakwik.initial_parameters: num_starting_clusters = 500 INFO klustakwik.initial_parameters: consider_cluster_deletion = True INFO klustakwik.initial_parameters: dist_thresh = 9.21034037198 INFO klustakwik.initial_parameters: use_noise_cluster = True INFO klustakwik.initial_parameters: use_mua_cluster = True Time taken for parallel clustering 156.44 s [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None] '''

shabnamkadir commented 9 years ago

I'm not sure why it is returning None objects instead of KK objects.

shabnamkadir commented 9 years ago

''' Traceback (most recent call last): File "parallel_global_script.py", line 304, in supercluster_results = lbv.map(lambda channel: run_subset_KK(supercluster_info['kk_sub'][channel]),full_adjacency.keys()) File "", line 2, in map File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results ret = f(self, _args, _kwargs) File "", line 2, in map File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids ret = f(self, _args, _kwargs) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 1123, in map return pf.map(_sequences) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 271, in map ret = self(_sequences) File "", line 2, in call File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 78, in sync_view_results return f(self, _args, *_kwargs) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 254, in call return r.get() File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 104, in get raise self._exception File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 139, in wait results = error.collect_exceptions(results, self._fname) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 233, in collect_exceptions raise e File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 231, in collect_exceptions raise CompositeError(msg, elist) IPython.parallel.error.CompositeError: one or more exceptions from call to method: [5:apply]: NameError: name 'run_subset_KK' is not defined [1:apply]: NameError: name 'run_subset_KK' is not defined [7:apply]: NameError: name 'run_subset_KK' is not defined [3:apply]: NameError: name 'run_subset_KK' is not defined .... 116 more exceptions ...

'''

shabnamkadir commented 9 years ago
importing run_subset_KK from parallel_global on engine(s)
[0:apply]: 
---------------------------------------------------------------------------ImportError                               Traceback (most recent call last)<string> in <module>()
/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py in remote_import(name, fromlist, level)
    439             import sys
    440             user_ns = globals()
--> 441             mod = __import__(name, fromlist=fromlist, level=level)
    442             if fromlist:
    443                 for key in fromlist:
ImportError: No module named 'parallel_global'

[1:apply]: 
shabnamkadir commented 9 years ago

This bug is now a Heisenbug. It sometimes parallelises fine.

About to parallelize
Time taken for parallel clustering 151.85 s
shabnamkadir commented 9 years ago

It always fails the first time it is launched, but if you keep the same engines running and don't restart, and run the script again - it works! The second time, the clustering happens fine...

shabnamkadir commented 9 years ago

When changing the number of points without restarting the engines (yes, I know):

About to parallelize
Time taken for parallel clustering 587.06 s
Traceback (most recent call last):
  File "parallel_global_script_40000.py", line 180, in <module>
    superclusters[supercluster_info['sub_spikes'][channel],i] = supercluster_results[i]+1
ValueError: shape mismatch: value array of shape (304,) could not be broadcast to indexing result of shape (2898,)
shabnamkadir commented 9 years ago

Possibly related: Traceback (most recent call last): File "nickground_global_script_1280000.py", line 185, in c[:]['supercluster_info'] = supercluster_info File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 806, in setitem self.update({key:value}) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 687, in update return self.push(ns, block=self.block, track=self.track) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 708, in push return self._really_apply(util._push, kwargs=ns, block=block, track=track, targets=targets) File "", line 2, in _really_apply File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results ret = f(self, _args, _kwargs) File "", line 2, in _really_apply File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids ret = f(self, _args, _kwargs) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 562, in _really_apply ident=ident) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/client.py", line 1280, in send_apply_request metadata=metadata, track=track) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/kernel/zmq/session.py", line 660, in send tracker = stream.send_multipart(to_send, copy=False, track=True) File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/zmq/sugar/socket.py", line 331, in send_multipart return self.send(msg_parts[-1], flags, copy=copy, track=track) File "zmq/backend/cython/socket.pyx", line 619, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6169) File "zmq/backend/cython/socket.pyx", line 674, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6034) File "zmq/backend/cython/socket.pyx", line 169, in zmq.backend.cython.socket._send_frame (zmq/backend/cython/socket.c:2118) File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:6920) zmq.error.Again: Resource temporarily unavailable Bad address (stream_engine.cpp:788) Aborted (core dumped)