Closed MSusik closed 9 years ago
Have you stumbled upon the same issue using Python 3 instead?
Not yet, but I will try running on 10 processes.
I run the clustering few times on Python3 with n_jobs=-1
and didn't have any difficulties. It seems to be the way to go. All the cores were used. The issue should remain opened, IMO.
Cool, one more reason to switch to Python 3 :)
Unfortunately, I stumpled upon this thing today:
Process PoolWorker-8:
Traceback (most recent call last):
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
self.run()
Traceback (most recent call last):
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/site-packages/joblib-0.8.4-py3.3.egg/joblib/parallel.py", line 512, in retrieve
self._output.append(job.get())
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/pool.py", line 562, in get
self.wait(timeout)
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/pool.py", line 559, in wait
self._event.wait(timeout)
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/threading.py", line 547, in wait
signaled = self._cond.wait(timeout)
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/threading.py", line 284, in wait
waiter.acquire()
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "strategy3.py", line 794, in <module>
n_jobs=args.n_jobs).fit(X, y)
File "/home/msusik/beard/beard/clustering/blocking.py", line 220, in fit
return self._fit(X, y, blocks)
File "/home/msusik/beard/beard/clustering/blocking.py", line 185, in _fit
b, X_mask, y_mask, clusterer in self._blocks(X, y, blocks)))
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/site-packages/joblib-0.8.4-py3.3.egg/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/site-packages/joblib-0.8.4-py3.3.egg/joblib/parallel.py", line 523, in retrieve
self._pool.terminate()
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/site-packages/joblib-0.8.4-py3.3.egg/joblib/pool.py", line 586, in terminate
super(MemmapingPool, self).terminate()
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/pool.py", line 465, in terminate
self._terminate()
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/util.py", line 188, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/pool.py", line 513, in _terminate_pool
p.terminate()
File "/home/msusik/anaconda/envs/py3k/lib/python3.3/multiprocessing/process.py", line 119, in terminate
self._popen.terminate()
AttributeError: 'NoneType' object has no attribute 'terminate
The exception on the bottom was caused by a ^C from me, but the top one shows a race condition.
Fixed by #65
When the
ScipyHierarchicalClustering
is run on many cores, on machine with Intel MKL, sometimes it hits a race condition. All the cores are idle and they take memory resources.After sending
ctrl+c
I receive:The issue is also mentioned here: https://github.com/joblib/joblib/issues/138
The trick with setting environmental variables is not hepling much. Note that the error appears no matter if the machine uses
Anaconda
.