Open shoyer opened 5 years ago
Thanks for picking this up, looks like the multiprocessing
interface requires an iterable of arguments whereas the current apply_async
method (which was originally for ipyparallel
) expects unpacked arguments.
I'll add an explicit check for multiprocessing.pool.Pool
instances and maybe add to the docs that the pool should match the API of one of:
concurrent.futures
multiprocessing.pool
ipyparallel
Sounds good!
executor
might also be a good name for this argument going forward.
You might also consider just deferring all of this to joblib
under the covers. Between joblib's built-in backends (loky
and threads
) and plugin support (e.g., for dask and ipyparallel) I think it would cover all the bases.
For the moment I will just implement the 'quick-fix' version and maybe deprecate the pool
argument in favour of executor
which I agree is a bit more explicit.
There definitely seems to be some advantages to joblib
and loky
in the long term though. As I see it:
num_threads
in processes for e.g. BLAScloudpickle
for interactive function support.xypzy
dependency.The main effort (and I haven't considered it much), would just be converting the nested submits to the flat Parallel()(delayed(fn)(*args, **kwargs) for ...)
syntax and back. And possibly making sure the progress bar tracks results as they finish rather than as they are submitted.
Also it is quite convenient to supply other executors e.g. mpi4py.futures.MPIPoolExecutor directly without implementing a full joblib backend.
Quick-fix in this commit by the way, but might leave this open as a reminder to consider joblib
!
Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)
You could also just use loky
directly instead of rewriting things to use the joblib interface -- it looks like it provides an Executor interface with most of these features: https://pypi.org/project/loky/. I don't think there are major advantages to using joblib's new interface from a library.
Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)
That's good to hear! All very much enabled by xarray
of course ;)
I will investigate changing the default executor to loky
, which it seems is shipped within joblib.externals
already. Unless is it drastically slower or something, I expect it will make sense to change.
With regard to joblib
proper, I guess I can see how it might be nice to do something like
with joblib.parallel_backend('dask'):
h.harvest_combos(combos, parallel='joblib')
if this is (or becomes) a widespread pattern for specifying parallelisation.
It actually needs the
concurrent.futures.Executor
API. I tried to pass in a thread pool from multiprocessing and it didn't work.