pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

.min() doesn't work on np.datetime64 with a chunked Dataset #5001

Open ludwigVonKoopa opened 3 years ago

ludwigVonKoopa commented 3 years ago

Hi all,

if a xr.Dataset is chunked, i cannot do ds.time.min(), i get an error : ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]'). I don't know if it is expected ? Moreover, ds2.time.mean() works

Thanks

What happened:

raised an UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

What you expected to happen:

compute the min & max on a chunked datetime64 xarray.DataArray

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np

obs=200
t0 = np.datetime64("2010-01-01T00:00:00")
tn = t0 + np.timedelta64(123*4, "D")

ds2 = xr.Dataset(
    {
        "time": (["obs"], np.arange(t0, tn, (tn-t0)/obs)),
    },
    coords={
        "obs": (["obs"], np.arange(obs)),
    },
).chunk({"obs": 100})

ds2.time.min()

Anything else we need to know?:

ds2.time.mean() works, max & min raise Exception

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: 6.2.2 IPython: 7.20.0 sphinx: 3.5.0
headtr1ck commented 2 years ago

core.duck_array_ops.mean seems to have a custom wrapper for datetime arrays. It should not be a problem to generalize this to min and max as well. Maybe there a more generic wrapper would be the best solution?

dcherian commented 2 years ago

Yeah that's a good idea. We should check whether dask & numpy supports this now.