pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

cftime default's datetime breaks CFTimeIndex #4853

Open aulemahal opened 3 years ago

aulemahal commented 3 years ago

What happened: With cftime 1.2.0, one can create datetime object with cftime.datetime(*args, calendar='calendar'), instead of using one of the subclasses (ex cftime.DatetimeNoLeap(*args)). In the latest release (1.4.0, yesterday), the subclasses have been deprecated, but kept as legacy. While all xr code still works (it is using the legacy subclasses), the CFTimeIndex object relies on the type of the datetime object in order to infer the calendar. If the datetime was created outside xarray, using the now default constructor, the returned type is not understood and CFTimeIndexbreaks.

What you expected to happen: I expected CFTimeIndex to be independent of the way the datetime object is created.

Minimal Complete Verifiable Example:

import cftime
import numpy as np
import xarray as xr

# A datetime array, not created in xarray
time = cftime.num2date(np.arange(365), "days since 2000-01-01", calendar="noleap")
a = xr.DataArray(np.zeros(365), dims=('time',), coords={'time': time})

a.indexes['time']

Fails with :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 342, in __repr__
    attrs_str = format_attrs(self)
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 264, in format_attrs
    attrs["freq"] = f"'{index.freq}'" if len(index) >= 3 else None
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 692, in freq
    return infer_freq(self)
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 96, in infer_freq
    inferer = _CFTimeFrequencyInferer(index)
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/frequencies.py", line 105, in __init__
    self.values = index.asi8
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 673, in asi8
    [
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/coding/cftimeindex.py", line 674, in <listcomp>
    _total_microseconds(exact_cftime_datetime_difference(epoch, date))
  File "/home/phobos/Python/xclim/.tox/py38/lib/python3.8/site-packages/xarray/core/resample_cftime.py", line 370, in exact_cftime_datetime_difference
    seconds = b.replace(microsecond=0) - a.replace(microsecond=0)
  File "src/cftime/_cftime.pyx", line 1153, in cftime._cftime.datetime.__sub__
ValueError: cannot compute the time difference between dates with different calendars

Anything else we need to know?:

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 31 2020, 02:39:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.10.11-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.utf8 LOCALE: fr_CA.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 46.1.3 pip: 20.1 conda: None pytest: 6.2.2 IPython: 7.19.0 sphinx: None
aulemahal commented 3 years ago

Woups, the bug seems more important than I first thought. It happens when reading files: MWE:

import xarray as xr

da = xr.DataArray(
    [0, 1, 2],
    dims=('time',),
    coords={'time': xr.cftime_range('2000-01-01', freq='D', periods=3, calendar='noleap')},
    name='test'
)
da.to_netcdf('test.nc')

da2 = xr.open_dataset('test.nc')
da2.indexes['time']

Fails with an error similar to the example above. Also failing on :

da2.sel(time='2000')
spencerkclark commented 3 years ago

Yikes, indeed I was concerned about that when you first posted the issue. In hindsight it's pretty clear why this causes problems within CFTimeIndex; as you note, we rely on passing the date_type around in a variety of places, and with the new change this means the calendar type is lost. I carelessly took the xarray test suite passing as reason to believe everything would be OK.

spencerkclark commented 3 years ago

This behavior will be rolled back in the next release of cftime, which @jswhit said he would issue soon.

spencerkclark commented 3 years ago

Actually @jswhit already made the release (sorry I missed this!), so this is fixed upstream. I'll leave this issue open, however, since I think it would be good to make CFTimeIndex compatible with cftime's new universal datetime object.