pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

datetime64 can no longer be serialized with numpy 2.1??? #9423

Open hmaarrfk opened 1 week ago

hmaarrfk commented 1 week ago

What happened?

It seems that numpy 2.1 broke some type checking with datetime64 and now it makes it difficult to serialize them?

Oddly enough, you cannot recreate this with numpy 2.0. Only 2.1

mamba create --name xr netcdf4 xarray numpy=2.1 python=3.11 --channel conda-forge --override-channels
import numpy as np
import xarray as xr
from datetime import datetime

ds = xr.Dataset()
ds['timestamp'] = np.datetime64(datetime.utcnow())
ds.to_netcdf('test.nc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/core/dataset.py", line 2329, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/backends/api.py", line 1360, in to_netcdf
    dump_to_store(
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/backends/api.py", line 1407, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/backends/common.py", line 363, in store
    variables, attributes = self.encode(variables, attributes)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/backends/common.py", line 452, in encode
    variables, attributes = cf_encoder(variables, attributes)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/conventions.py", line 805, in cf_encoder
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/conventions.py", line 805, in <dictcomp>
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/conventions.py", line 196, in encode_cf_variable
    var = coder.encode(var, name=name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/coding/times.py", line 976, in encode
    (data, units, calendar) = encode_cf_datetime(data, units, calendar, dtype)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/coding/times.py", line 725, in encode_cf_datetime
    return _eagerly_encode_cf_datetime(dates, units, calendar, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/coding/times.py", line 737, in _eagerly_encode_cf_datetime
    data_units = infer_datetime_units(dates)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/coding/times.py", line 443, in infer_datetime_units
    reference_date = format_cftime_datetime(reference_date)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/xarray/coding/times.py", line 453, in format_cftime_datetime
    return f"{date.year:04d}-{date.month:02d}-{date.day:02d} {date.hour:02d}:{date.minute:02d}:{date.second:02d}.{date.microsecond:06d}"
              ^^^^^^^^^
AttributeError: 'numpy.datetime64' object has no attribute 'year

What did you expect to happen?

for it to work ;)

Minimal Complete Verifiable Example

as above

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

$ conda list
# packages in environment at /home/mark/miniforge3/envs/xr:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
blosc                     1.21.6               hef167b5_0    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.33.1               heb4867d_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cftime                    1.6.4           py311h18e1886_0    conda-forge
hdf4                      4.2.15               h2a13503_7    conda-forge
hdf5                      1.14.3          nompi_hdf9ad27_105    conda-forge
icu                       75.1                 he02047a_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libblas                   3.9.0           23_linux64_openblas    conda-forge
libcblas                  3.9.0           23_linux64_openblas    conda-forge
libcurl                   8.9.1                hdb1bdb2_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.1.0               h77fa898_1    conda-forge
libgcc-ng                 14.1.0               h69a702a_1    conda-forge
libgfortran               14.1.0               h69a702a_1    conda-forge
libgfortran-ng            14.1.0               h69a702a_1    conda-forge
libgfortran5              14.1.0               hc5f4f2c_1    conda-forge
libgomp                   14.1.0               h77fa898_1    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           23_linux64_openblas    conda-forge
libnetcdf                 4.9.2           nompi_h135f659_114    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_hac2b453_1    conda-forge
libsqlite                 3.46.1               hadc24fc_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.1.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.1.0               h4852527_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               he7c6b58_4    conda-forge
libzip                    1.10.1               h2629f0a_3    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
netcdf4                   1.7.1           nompi_py311h25b3b55_101    conda-forge
numpy                     2.1.0           py311h71ddf71_1    conda-forge
openssl                   3.3.1                hb9d3cd8_3    conda-forge
packaging                 24.1               pyhd8ed1ab_0    conda-forge
pandas                    2.2.2           py311h14de704_1    conda-forge
pip                       24.2               pyh8b19718_1    conda-forge
python                    3.11.9          hb806964_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python_abi                3.11                    5_cp311    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
setuptools                73.0.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.2.1                ha2e4443_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tzdata                    2024a                h8827d51_1    conda-forge
wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
xarray                    2024.7.0           pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.3.1                h4ab18f5_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

Environment

``` /home/mark/miniforge3/envs/xr/lib/python3.11/site-packages/_distutils_hack/__init__.py:31: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( INSTALLED VERSIONS ------------------ commit: None python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.8.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.7.0 pandas: 2.2.2 numpy: 2.1.0 scipy: None netCDF4: 1.7.1 pydap: None h5netcdf: None h5py: None zarr: None cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 73.0.1 pip: 24.2 conda: None pytest: None mypy: None IPython: None sphinx: None ```
hmaarrfk commented 1 week ago

maybe related to https://github.com/pydata/xarray/issues/9399

keewis commented 1 week ago

Indeed, the changes in #9403 appear to fix this. However, I can only reproduce with

ds["timestamp"] = np.datetime64(datetime.now(UTC))

If instead I use

ds["timestamp"] = np.datetime64(datetime.now(UTC), "ns")

it doesn't fail. So this also depends on the units of the datetime64 object.

hmaarrfk commented 1 week ago

I've seen the "ns" warning for a while I'm having a hard time convincing myself that "ns" is the right units.

>> 2 ** 63
9223372036854775808

maybe i'm just crazy though:

ns_per_year = (365 * 24 * 60 * 60 * 1E9)

2 ** 63 / ns_per_year
292.471208677536

and 292 years is fine if you consider the date starts from 1974. <-- is this true?

keewis commented 1 week ago

I think the timestamps should start from 1970-01-01 00:00:00 (as they're standard unix timestamps), so you're not far off with 1974. You can verify that with np.array([0]).astype("datetime64[ns]").

There have been requests to relax the restriction to "ns" units for quite some time (following pandas), and in general this should be possible but is quite a bit of work.

hmaarrfk commented 1 week ago

Right so the question is, do we care about the year 2262?

np.array([0, np.iinfo('int64').max]).astype("datetime64[ns]")
array(['1970-01-01T00:00:00.000000000', '2262-04-11T23:47:16.854775807'],
      dtype='datetime64[ns]'

so i think i'm over thinking it for 2024 in my own code base, so I can start to sprinkle "ns" within it to help with compatibility while you get that PR resolved.