Closed MSusik closed 9 years ago
Besides my nitpicks, this looks good to me. Thanks! Two questions though:
Does this solve the blocking issue we had?
Yes, just a one small improvement is needed, as this implementation can hang.
Have you checked this works fine both for Python 2.7 and 3.4?
Yes.
Great then, +1 for merge once my comments are fixed.
CC: @ogrisel Just to let you know we have add hanging issues with joblib -- processes are all stalling, CPU usage goes down to 0, and then nothing more happens. Do you know where this could be coming from? These are very difficult to reproduce and seem to appear at random... Directly using multiprocessing
solves our immediate problem, but it would be nice if joblib could be used again.
Note that the approach is completely different as joblib spawns a process for every block of data, while I keep a pool of processes that run all the time.
I want to make it possible to reuse joblib pools too across several consecutive calls to Parallel.__call__
. However this is probably not the cause of the hanging.
Which implementation of BLAS do you use when you observe the hanging? anaconda's MKL? OSX's builtin accelerate?
Note that the approach is completely different as joblib spawns a process for every block of data, while I keep a pool of processes that run all the time.
Note: joblib spawns a pool with a fixed number of worker process per call to Parallel (using a multiprocessing Pool
instance using the the apply_async
under the hood). But you can pass many blocks of data and the number of workers stay constants.
BTW, could you also tell be if you observe the hanging behavior when enabling the forkserver
start method under Python 3.4+? To try that you need to modify the main block of the main script that starts your Python program with:
import multiprocessing as mp
# import your modules here
if __name__ == '__main__':
mp.set_start_method('forkserver')
# call your code here
More details here: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
Which implementation of BLAS do you use when you observe the hanging? anaconda's MKL?
We observed the error when using anaconda's MKL and when working outside of anaconda. For example here is a numpy's config for the run without anaconda:
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib']
language = f77
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE
That's weird. Do you use OpenMP-based libraries or compiled extensions (e.g. Cython prange
constructs)?
Would be great to provide a standalone joblib snippet that reproduce the freeze so that I can try to debug.
Signed-off-by: Mateusz Susik mateusz.susik@cern.ch