Closed TomNicholas closed 3 years ago
you don't even need the compute
to get the warning:
In [3]: chunked.mean()
.../lib/python3.8/site-packages/dask/array/core.py:3113: UserWarning: Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior.
warnings.warn(
Out[3]:
<xarray.DataArray ()>
dask.array<mean_agg-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>
is enough, and computing returns
<xarray.DataArray ()>
<Quantity(dask.array<true_divide, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>, 'meter')>
Note that there's no units in the result of .mean()
, that the return value of compute
is a dask
array (wrapped by pint
) and that we need to compute twice to get the actual result.
In conclusion: this is a pretty serious bug (in xarray
, I think?) and the warning should actually be an error in this case.
Oh dear. Does the other order (da.chunk(1).pint.quantify()
) behave any differently?
no, it doesn't, which is why I believe this is a bug in xarray
It would be really nice to get this to work before we publish #114 (not that there is any time limit), but I have time now and am keen to help if I can. Should I re-raise this issue on xarray?
yes, that would be good.
I didn't test xarray(pint(dask))
thoroughly, yet, so I guess we can expect more to fail. I really hope pydata/xarray#4972 would have caught something like this, which I guess means I should try to finalize that as soon as possible.
Note that there's no units in the result of .mean(), that the return value of compute is a dask array (wrapped by pint) and that we need to compute twice to get the actual result.
Are we definitely seeing the same behaviour as each other? When I do print(chunked.compute())
(after chunking in either way) I get
<xarray.DataArray (dim_0: 3)>
<Quantity([1 2 3], 'meter')>
Dimensions without coordinates: dim_0
which seems right to me?
it is correct and I get the same result (which means .pint.chunk
does not have a bug), but chunked.mean()
is definitely wrong (I checked both master
and v0.2
)
with .compute
I meant that chunked.mean().compute().compute()
is required to get the result for the mean
Right sorry, I had left out the call to mean.
This was fixed by https://github.com/pydata/xarray/issues/5559
In [4]: da = xr.DataArray([1,2,3], dims=['x'], attrs={'units': 'metres'})
In [5]: chunked = da.pint.quantify().pint.chunk(1)
In [6]: chunked
Out[6]:
<xarray.DataArray (x: 3)>
<Quantity(dask.array<xarray-<this-array>, shape=(3,), dtype=int64, chunksize=(1,), chunktype=numpy.ndarray>, 'meter')>
Dimensions without coordinates: x
In [7]: chunked.mean().compute()
Out[7]:
<xarray.DataArray ()>
<Quantity(2.0, 'meter')>
Everything looks fine here, excellent...
but when I go to compute then I get a
UserWarning
, even though it returns the correct answer:Even if this is working fine then we don't want to be giving warnings to the user ideally.