Closed dougiesquire closed 3 years ago
Note, a fix is being implemented in dask here: https://github.com/dask/dask/pull/7584
Now here https://github.com/dask/dask/pull/7594.
Just to re-emphasise, xhistogram is currently broken for multiple dask args
due to this issue. Depending on how long we anticipate dask/dask#7594 may take to be released, would it be worth reverting back to @rabernat's wrapper of np.ravel_multi_index
and releasing a temporary fix?
xhistogram is currently broken for multiple dask
args
due to this issue
Can't we fix this by pinning a compatible dask version here?
I don't think there is a dask version with a working version of ravel_multi_index
unfortunately. This was only introduced in the latest version of dask.
I don't think there is a dask version with a working version of
ravel_multi_index
unfortunately.
I'm a bit confused. Has xhistogram always been broken with multiple dask args? In my recollection, things once worked absolutely fine. This code has been in for a long time:
https://github.com/xgcm/xhistogram/blob/master/xhistogram/duck_array_ops.py#L39
So if we rewind our dask version to before 2021.03, won't it go back to working agin?
This code has been in for a long time:
https://github.com/xgcm/xhistogram/blob/master/xhistogram/duck_array_ops.py#L39
I'm not sure this is correct. I think this line was added in #31 just a few months ago. That PR switched from using a "dask-enabled" version of ravel_multi_index
that you wrote, to using the official, but broken, dask version.
AFAICT we have two options:
Yes @dougiesquire you're correct (and sorry for not following up sooner).
Which option do you think is best?
Quick update: https://github.com/dask/dask/pull/7594 was just merged and should be out in a dask
release in ~1.5 weeks (not this Friday, but next Friday)
Just checking in, https://github.com/dask/dask/pull/7594 is now out in a dask release. Is it safe to close this issue? Perhaps we should bump xhistogram
s minimum dask
version
Just checking in, dask/dask#7594 is now out in a dask release. Is it safe to close this issue? Perhaps we should bump xhistograms minimum dask version
Thanks for the reminder @jrbourbeau. xhistrogram
doesn't actually use dask.array.ravel_multi_index
since the refactor to use dask.array.blockwise
(#49), so there hopefully shouldn't be a need to pin a minimum dask version.
As of 0.1.3, xhistogram now fails when provided multiple dask
args
. This is because the dask version ofravel_multi_index
that was introduced into xhistogram in 0.1.3 does not actually replicate the API of the equivalent numpy function. Specifically, the numpy implementation can receive a tuple of integer arrays, whereas the dask version cannot.We actually don't have a test in our suite for multiple dask
args
, which is why this bug went uncaught until now.I've opened an issue with dask, but we probably need to provide a more immediate fix for xhistogram. @rabernat, what do you think about reverting back to using your wrapper of
np.ravel_multi_index
until the issue is resolved with dask?