pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

Computing averaged time produces wrong/incorrect time values #4341

Closed andersy005 closed 4 years ago

andersy005 commented 4 years ago

What happened:

While computing averaged time using time_bounds via times = bounds.mean('d2'), I get weird results (see example below). It's my understanding that this is a bug, but I don't know yet where it's coming from. I should note that in addition to getting wrong time values, the resulting time values are not monotonically increasing even though my time bounds are.

What you expected to happen:

Correct averaged time values

Minimal Complete Verifiable Example:

In [1]: import xarray as xr                                                                                                                                                                        

In [2]: import numpy as np                                                                                                                                                                         

In [3]: dates = xr.cftime_range(start='0400-01', end='2101-01', freq='120Y', calendar='noleap')                                  

In [4]: bounds = xr.DataArray(np.vstack([dates[:-1], dates[1:]]).T, dims=['time', 'd2'])                                         

In [5]: bounds                                                                                                                   
Out[5]: 
<xarray.DataArray (time: 14, d2: 2)>
array([[cftime.DatetimeNoLeap(400, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(520, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(520, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(640, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(640, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(760, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(760, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(880, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(880, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1000, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1000, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1120, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1120, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1240, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1240, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1360, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1360, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1480, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1480, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1600, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1600, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1720, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1720, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1840, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1840, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(1960, 12, 31, 0, 0, 0, 0)],
       [cftime.DatetimeNoLeap(1960, 12, 31, 0, 0, 0, 0),
        cftime.DatetimeNoLeap(2080, 12, 31, 0, 0, 0, 0)]], dtype=object)
Dimensions without coordinates: time, d2

In [6]: bounds.mean('d2')                                                                                                        
Out[6]: 
<xarray.DataArray (time: 14)>
array([cftime.DatetimeNoLeap(460, 12, 31, 0, 0, 0, 0),
       cftime.DatetimeNoLeap(580, 12, 31, 0, 0, 0, 0),
       cftime.DatetimeNoLeap(116, 1, 21, 0, 25, 26, 290448),
       cftime.DatetimeNoLeap(236, 1, 21, 0, 25, 26, 290448),
       cftime.DatetimeNoLeap(356, 1, 21, 0, 25, 26, 290448),
       cftime.DatetimeNoLeap(476, 1, 21, 0, 25, 26, 290448),
       cftime.DatetimeNoLeap(596, 1, 21, 0, 25, 26, 290448),
       cftime.DatetimeNoLeap(131, 2, 11, 0, 50, 52, 580897),
       cftime.DatetimeNoLeap(251, 2, 11, 0, 50, 52, 580897),
       cftime.DatetimeNoLeap(371, 2, 11, 0, 50, 52, 580897),
       cftime.DatetimeNoLeap(491, 2, 11, 0, 50, 52, 580897),
       cftime.DatetimeNoLeap(611, 2, 11, 0, 50, 52, 580897),
       cftime.DatetimeNoLeap(146, 3, 4, 1, 16, 18, 871345),
       cftime.DatetimeNoLeap(266, 3, 4, 1, 16, 18, 871345)], dtype=object)
Dimensions without coordinates: time

Anything else we need to know?:

Environment:

Output of xr.show_versions() ```python INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 23 2020, 03:54:19) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.0 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.22.0 distributed: 2.22.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: None setuptools: 49.2.1.post20200802 pip: 20.2.1 conda: None pytest: None IPython: 7.17.0 sphinx: None ```
spencerkclark commented 4 years ago

Yikes, this looks bad; thanks for the report @andersy005. The logic in duck_array_ops is pretty thorny surrounding cftime dates. I'm guessing there's some sort of integer overflow issue happening here (hence the weird non-monotonic behavior). I'll have a closer look tomorrow.

spencerkclark commented 4 years ago

@andersy005 I think I tracked it down. See #4344 for a fix.

andersy005 commented 4 years ago

Many thanks for addressing this issue, @spencerkclark!