jcmgray / xyzpy

Efficiently generate and analyse high dimensional data.
http://xyzpy.readthedocs.io
MIT License
67 stars 11 forks source link

The "pool" argument is a little confusing #5

Open shoyer opened 5 years ago

shoyer commented 5 years ago

It actually needs the concurrent.futures.Executor API. I tried to pass in a thread pool from multiprocessing and it didn't work.

jcmgray commented 5 years ago

Thanks for picking this up, looks like the multiprocessing interface requires an iterable of arguments whereas the current apply_async method (which was originally for ipyparallel) expects unpacked arguments.

I'll add an explicit check for multiprocessing.pool.Pool instances and maybe add to the docs that the pool should match the API of one of:

shoyer commented 5 years ago

Sounds good!

executor might also be a good name for this argument going forward.

You might also consider just deferring all of this to joblib under the covers. Between joblib's built-in backends (loky and threads) and plugin support (e.g., for dask and ipyparallel) I think it would cover all the bases.

jcmgray commented 5 years ago

For the moment I will just implement the 'quick-fix' version and maybe deprecate the pool argument in favour of executor which I agree is a bit more explicit.

There definitely seems to be some advantages to joblib and loky in the long term though. As I see it:

The main effort (and I haven't considered it much), would just be converting the nested submits to the flat Parallel()(delayed(fn)(*args, **kwargs) for ...) syntax and back. And possibly making sure the progress bar tracks results as they finish rather than as they are submitted.

Also it is quite convenient to supply other executors e.g. mpi4py.futures.MPIPoolExecutor directly without implementing a full joblib backend.

jcmgray commented 5 years ago

Quick-fix in this commit by the way, but might leave this open as a reminder to consider joblib!

shoyer commented 5 years ago

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

You could also just use loky directly instead of rewriting things to use the joblib interface -- it looks like it provides an Executor interface with most of these features: https://pypi.org/project/loky/. I don't think there are major advantages to using joblib's new interface from a library.

jcmgray commented 5 years ago

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

That's good to hear! All very much enabled by xarray of course ;)

I will investigate changing the default executor to loky, which it seems is shipped within joblib.externals already. Unless is it drastically slower or something, I expect it will make sense to change.


With regard to joblib proper, I guess I can see how it might be nice to do something like

with joblib.parallel_backend('dask'):
    h.harvest_combos(combos, parallel='joblib')

if this is (or becomes) a widespread pattern for specifying parallelisation.