Closed knutfrode closed 3 years ago
Noting that @aaronspring encountered what appears to be a similar problem using dask==2021.03.0 with xhistogram inside xskillscore (https://github.com/xarray-contrib/xskillscore/issues/280)
Ok, thanks for reporting this. Need to get a fix in asap. Any ideas?
Ideally we should be testing against xarray and dask master branches.
Took a quick look at one of the failing tests in xskillscore https://github.com/xarray-contrib/xskillscore/issues/280#issuecomment-798825624
Failing at dask/array/reshape.py:206
. Worth getting at with a minimal example.
failing here with dask: https://github.com/xgcm/xhistogram/blob/master/xhistogram/test/test_core.py#L177
The test: https://gist.github.com/raybellwaves/fe5c941eea0a885616c3c1cb3d8627b3 errors come in on the histogram call.
The core issue is that dask's bincount
function, which we import here
is now returning dask arrays with shape ()
. I'm looking into this. Is there a dask PR someone can point to where this behavior was changed?
maybe https://github.com/dask/dask/pull/7391 will help?
I just reviewed that PR and I'm 99% sure that it will fix the problem.
I consider this a regression in dask, so we will not be making a workaround in xhistogram. We have two choices here:
What do those affected prefer?
I'm fine with doing nothing if others are. dask releases are more or less monthly and I think we've already pinned an earlier version of dask in the last xskillscore release.
https://github.com/dask/dask/pull/7391 was just merged into Dask's main
branch and I confirmed that tests here pass with the changes in that PR
I find a followup issue with dask==2021.04
: https://github.com/xarray-contrib/xskillscore/issues/288
Its a new error message:
skillscore/tests/test_probabilistic.py:831:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
xskillscore/core/probabilistic.py:1209: in roc
dim=dim,
xskillscore/core/contingency.py:122: in __init__
self._table = self._get_contingency_table(dim)
xskillscore/core/contingency.py:167: in _get_contingency_table
bin_dim_suffix="_bin",
xskillscore/core/utils.py:166: in histogram
return xhist(*args, bins=bins, **kwargs)
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/xarray.py:146: in histogram
block_size=block_size
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/core.py:271: in histogram
block_size=block_size,
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/core.py:131: in _histogram_2d_vectorized
bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/xhistogram/duck_array_ops.py:24: in f
return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:6: in ravel_multi_index
???
/usr/share/miniconda/envs/xskillscore-minimum-tests/lib/python3.7/site-packages/dask/array/core.py:1525: in __array_function__
return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
multi_index = [dask.array<digitize, shape=(1, 200), dtype=int64, chunksize=(1, 200), chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 200), dtype=int64, chunksize=(1, 200), chunktype=numpy.ndarray>]
dims = [4, 4], mode = 'raise', order = 'C'
@wraps(np.ravel_multi_index)
def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
> return multi_index.map_blocks(
_ravel_multi_index_kernel,
dtype=np.intp,
chunks=(multi_index.shape[-1],),
drop_axis=0,
func_kwargs=dict(dims=dims, mode=mode, order=order),
)
E AttributeError: 'list' object has no attribute 'map_blocks'
xhist master doesnt solve it: https://github.com/xarray-contrib/xskillscore/pull/289
xgcm tests seem ok on upstream: https://github.com/xgcm/xhistogram/actions/workflows/upstream.yml Is there something in xskillscore that isn't tested in xgcm?
the xhist wrapper for datasets by @dougiesquire
I think I've found the issue here. It's an issue in dask that was introduced into xhistogram in the latest version.
The dask version of ravel_multi_index
(introduced into xhistogram in 0.1.3) does not actually replicate the API of the equivalent numpy function. Specifically, the numpy implementation can receive a tuple of integer arrays, whereas the dask version cannot (despite saying it can). This functionality is required in xhistogram when there are multiple args
.
That is, the following works:
import numpy as np
import dask.array as dsa
arr = np.array([[3,6,6],[4,5,1]]) # Example from numpy docs
da = dsa.from_array(arr)
dsa.ravel_multi_index(da, (7,7))
but the following fails with the error 'tuple' object has no attribute 'map_blocks'
:
dsa.ravel_multi_index((da, da), (7,7))
This is a bug in dask and I'll open an issue about it.
xhistogram doesn't currently have tests with multiple dask args
, which is why this didn't get flagged earlier (we should add them!), but xskillscore does try to use xhistogram in this way (and thus fails).
@raybellwaves, @aaronspring one solution for now in xskillscore is to pin xhistogram==0.1.2.
Issue opened with dask here: https://github.com/dask/dask/issues/7580
Issue opened with dask here: dask/dask#7580
A fix for this has been implemented in the latest release of dask (2021.06.1). In the meantime, however, the xhistogram refactor to use dask.array.blockwise
(#49) has removed our dependence on the dask.array functions referenced in this issue, so we actually no longer need this fix.
Closing the issue.
First of all thank you for making this useful tool. It has worked well for me until Dask version 2021.03, after which I am getting error as below when calling
histogram
. I apologize that this might be not be sufficient information to reproduce.