spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
83 stars 12 forks source link

New failure in test_apply_time_offset #320

Closed spencerahill closed 5 years ago

spencerahill commented 5 years ago

After reverting to xarray 0.11, c.f. #319, the test suite loads without error but has one failure, which I've pasted below. @spencerkclark, since this seems cftime-related (or adjacent), I'm going to punt this one to you.

test_apply_time_offset ___________________________________________________________________

cls = <class 'pandas.core.arrays.datetimes.DatetimeArray'>
index = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-30 00:00:00']
Length: 2, dtype: datetime64[ns], freq = <MonthEnd>, kwargs = {}, inferred = None
on_freq = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-31 00:00:00']
Length: 2, dtype: datetime64[ns]

    @classmethod
    def _validate_frequency(cls, index, freq, **kwargs):
        """
        Validate that a frequency is compatible with the values of a given
        Datetime Array/Index or Timedelta Array/Index

        Parameters
        ----------
        index : DatetimeIndex or TimedeltaIndex
            The index on which to determine if the given frequency is valid
        freq : DateOffset
            The frequency to validate
        """
        if is_period_dtype(cls):
            # Frequency validation is not meaningful for Period Array/Index
            return None

        inferred = index.inferred_freq
        if index.size == 0 or inferred == freq.freqstr:
            return None

        try:
            on_freq = cls._generate_range(start=index[0], end=None,
                                          periods=len(index), freq=freq,
                                          **kwargs)
            if not np.array_equal(index.asi8, on_freq.asi8):
>               raise ValueError
E               ValueError

../../pandas/core/arrays/datetimelike.py:884: ValueError

During handling of the above exception, another exception occurred:

    def test_apply_time_offset():
        start = datetime.datetime(1900, 5, 10)
        years, months, days, hours = -2, 1, 7, 3
        # test lengths 0, 1, and >1 of input time array
        for periods in range(3):
            times = pd.date_range(start=start, freq='M', periods=periods)
            actual = apply_time_offset(xr.DataArray(times), years=years,
                                       months=months, days=days, hours=hours)
            desired = (times + pd.tseries.offsets.DateOffset(
>               years=years, months=months, days=days, hours=hours
            ))

test_utils_times.py:55:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../pandas/core/indexes/datetimelike.py:489: in __add__
    result = self._data.__add__(maybe_unwrap_index(other))
../../pandas/core/arrays/datetimelike.py:1190: in __add__
    result = self._add_offset(other)
../../pandas/core/arrays/datetimes.py:737: in _add_offset
    result = offset.apply_index(values)
pandas/_libs/tslibs/offsets.pyx:116: in pandas._libs.tslibs.offsets.apply_index_wraps.wrapper
    ???
../../pandas/tseries/offsets.py:278: in apply_index
    i = type(i)(shifted, freq=i.freq, dtype=i.dtype)
../../pandas/core/arrays/datetimes.py:351: in __init__
    type(self)._validate_frequency(self, freq)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'pandas.core.arrays.datetimes.DatetimeArray'>
index = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-30 00:00:00']
Length: 2, dtype: datetime64[ns], freq = <MonthEnd>, kwargs = {}, inferred = None
on_freq = <DatetimeArray>
['1898-06-30 00:00:00', '1898-07-31 00:00:00']
Length: 2, dtype: datetime64[ns]

    @classmethod
    def _validate_frequency(cls, index, freq, **kwargs):
        """
        Validate that a frequency is compatible with the values of a given
        Datetime Array/Index or Timedelta Array/Index

        Parameters
        ----------
        index : DatetimeIndex or TimedeltaIndex
            The index on which to determine if the given frequency is valid
        freq : DateOffset
            The frequency to validate
        """
        if is_period_dtype(cls):
            # Frequency validation is not meaningful for Period Array/Index
            return None

        inferred = index.inferred_freq
        if index.size == 0 or inferred == freq.freqstr:
            return None

        try:
            on_freq = cls._generate_range(start=index[0], end=None,
                                          periods=len(index), freq=freq,
                                          **kwargs)
            if not np.array_equal(index.asi8, on_freq.asi8):
                raise ValueError
        except ValueError as e:
            if "non-fixed" in str(e):
                # non-fixed frequencies are not meaningful for timedelta64;
                #  we retain that error message
                raise e
            # GH#11587 the main way this is reached is if the `np.array_equal`
            #  check above is False.  This can also be reached if index[0]
            #  is `NaT`, in which case the call to `cls._generate_range` will
            #  raise a ValueError, which we re-raise with a more targeted
            #  message.
            raise ValueError('Inferred frequency {infer} from passed values '
                             'does not conform to passed frequency {passed}'
>                            .format(infer=inferred, passed=freq.freqstr))
E           ValueError: Inferred frequency None from passed values does not conform to passed frequency M

../../pandas/core/arrays/datetimelike.py:897: ValueError
spencerahill commented 5 years ago

Actually there are two failing tests, both time-related; see e.g. https://travis-ci.org/spencerahill/aospy/jobs/513359222#L1440

spencerkclark commented 5 years ago

Huh...this seems to be a bug in the latest version of pandas.

In [1]: import pandas as pd

In [2]: times = pd.date_range('2000', periods=2, freq='M')

In [3]: times + pd.DateOffset(months=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    883             if not np.array_equal(index.asi8, on_freq.asi8):
--> 884                 raise ValueError
    885         except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-3-846e5412af1b> in <module>()
----> 1 times + pd.DateOffset(months=1)

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in __add__(self, other)
    487         def __add__(self, other):
    488             # dispatch to ExtensionArray implementation
--> 489             result = self._data.__add__(maybe_unwrap_index(other))
    490             return wrap_arithmetic_op(self, other, result)
    491

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in __add__(self, other)
   1188         elif isinstance(other, DateOffset):
   1189             # specifically _not_ a Tick
-> 1190             result = self._add_offset(other)
   1191         elif isinstance(other, (datetime, np.datetime64)):
   1192             result = self._add_datetimelike_scalar(other)

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in _add_offset(self, offset)
    735             else:
    736                 values = self
--> 737             result = offset.apply_index(values)
    738             if self.tz is not None:
    739                 result = result.tz_localize(self.tz)

pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.apply_index_wraps.wrapper()

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/tseries/offsets.py in apply_index(self, i)
    276             if months:
    277                 shifted = liboffsets.shift_months(i.asi8, months)
--> 278                 i = type(i)(shifted, freq=i.freq, dtype=i.dtype)
    279
    280             weeks = (kwds.get('weeks', 0)) * self.n

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in __init__(self, values, dtype, freq, copy)
    349
    350         if inferred_freq is None and freq is not None:
--> 351             type(self)._validate_frequency(self, freq)
    352
    353     @classmethod

//anaconda/envs/aospy_dev/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    895             raise ValueError('Inferred frequency {infer} from passed values '
    896                              'does not conform to passed frequency {passed}'
--> 897                              .format(infer=inferred, passed=freq.freqstr))
    898
    899     # monotonicity/uniqueness properties are called via frequencies.infer_freq,

ValueError: Inferred frequency None from passed values does not conform to passed frequency M

We're actually already working around it implicitly internally in apply_time_offset, because we automatically cast the DatetimeIndex with a specified frequency to one without a specified frequency. When I get a chance, I'll report this upstream, but I don't think it's a major concern. In the meantime I'll submit a PR to fix the failing test with a workaround.

spencerkclark commented 5 years ago

Actually there are two failing tests, both time-related; see e.g. https://travis-ci.org/spencerahill/aospy/jobs/513359222#L1440

Oddly I was not able to reproduce this other failing test, and when I re-ran the build that failure disappeared. We'll see if it comes up again.

spencerkclark commented 5 years ago

It turns out the other bug was in fact real :). I locally created a new environment based on our CI setup for the failing build and was able to reproduce the problem. It was made harder to find because we were mutating the state of the input argument, and the order of the variables in the set time_indexed_vars ended up dictating whether the function would succeed or not (so sometimes it would, and sometimes it wouldn't).

Ultimately I think this stems from the following change in behavior in xarray between 0.11.3 and 0.12.0. For a little more context on why this is relevant to this issue, see this gist.

xarray 0.11.3

In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.indexes
Out[3]: x: Int64Index([0], dtype='int64', name='x')

In [4]: da.isel(x=0).expand_dims('x').indexes
Out[4]: x: Int64Index([0], dtype='int64', name='x')

In [5]: da.to_dataset().isel(x=0).expand_dims('x').a.indexes
Out[5]: x: Int64Index([0], dtype='int64', name='x')

xarray 0.12

Note the difference in behavior between line 5 here and line 5 above.

In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.indexes
Out[3]: x: Int64Index([0], dtype='int64', name='x')

In [4]: da.isel(x=0).expand_dims('x').indexes
Out[4]: x: Int64Index([0], dtype='int64', name='x')

In [5]: da.to_dataset().isel(x=0).expand_dims('x').a.indexes
Out[5]:

I'm pretty sure this is a regression in xarray, but I'll sleep on it before reporting it.

spencerkclark commented 5 years ago

Interestingly the following simpler variant works in xarray 0.12:

In [18]: da = xr.DataArray(1, coords={'x': 0}, name='a')

In [19]: da.expand_dims('x').indexes
Out[19]: x: Int64Index([0], dtype='int64', name='x')

In [20]: da.to_dataset().expand_dims('x').a.indexes
Out[20]: x: Int64Index([0], dtype='int64', name='x')

So apparently the isel step is important.