Open TomAugspurger opened 5 years ago
Another one.
In [16]: idx = pd.date_range('2014-01-02', '2014-04-30', freq='M', tz='UTC')
In [17]: result = idx.tz_convert("US/Eastern")
In [18]: result
Out[18]:
DatetimeIndex(['2014-01-30 19:00:00-05:00', '2014-02-27 19:00:00-05:00',
'2014-03-30 20:00:00-04:00', '2014-04-29 20:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='M')
In [19]: result._eadata._validate_frequency(result, result.freq)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
913 if not np.array_equal(index.asi8, on_freq.asi8):
--> 914 raise ValueError
915 except ValueError as e:
ValueError:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-19-24fa3f452eb0> in <module>
----> 1 result._eadata._validate_frequency(result, result.freq)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
925 raise ValueError('Inferred frequency {infer} from passed values '
926 'does not conform to passed frequency {passed}'
--> 927 .format(infer=inferred, passed=freq.freqstr))
928
929 # monotonicity/uniqueness properties are called via frequencies.infer_freq,
ValueError: Inferred frequency None from passed values does not conform to passed frequency M
though, perhaps there's a bug in the freq validation around DST boundaries? But maybe not. Here's the range for US/Eastern
In [36]: pd.date_range('2014-01-02', '2014-04-30', freq='M', tz='US/Eastern')
Out[36]:
DatetimeIndex(['2014-01-31 00:00:00-05:00', '2014-02-28 00:00:00-05:00',
'2014-03-31 00:00:00-04:00', '2014-04-30 00:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='M')
So should tz_convert
invalidate the freq?
One more. In this case we seem to generate an array from bdate_range
that doesn't have a valid freq (not sure if the bug is in the generation or the freq validation, probably the validation).
START = pd.Timestamp(2009, 3, 13)
END1 = pd.Timestamp(2009, 3, 18)
END2 = pd.Timestamp(2009, 3, 19)
freq = 'CBH'
a = pd.bdate_range(START, END1, freq=freq, weekmask='Mon Wed Fri',
holidays=['2009-03-14'])
b = pd.bdate_range(START, END2, freq=freq, weekmask='Mon Wed Fri',
holidays=['2009-03-14'])
a._eadata._validate_frequency(a, a.freq)
b._eadata._validate_frequency(b, b.freq)
a
validates fine, but b
doesn't
In [44]: b._eadata._validate_frequency(b, b.freq)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
913 if not np.array_equal(index.asi8, on_freq.asi8):
--> 914 raise ValueError
915 except ValueError as e:
ValueError:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-44-2b6f5f040d09> in <module>
----> 1 b._eadata._validate_frequency(b, b.freq)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
925 raise ValueError('Inferred frequency {infer} from passed values '
926 'does not conform to passed frequency {passed}'
--> 927 .format(infer=inferred, passed=freq.freqstr))
928
929 # monotonicity/uniqueness properties are called via frequencies.infer_freq,
ValueError: Inferred frequency None from passed values does not conform to passed frequency CBH
In the freq validation for b
we generate an on_freq
with the wrong(?) number of periods
ipdb> len(on_freq)
16
ipdb> len(index)
24
Do we have a policy on when an operation that might invalidate a freq should infer vs. just set it to None? For example, in DatetimeIndex.where
we could either do _shallow_copy(freq=None)
or _shallow_copy_with_infer
.
I think that a fix for these issues (invalidating in places where needed, maybe fixing some bugs in the current freq validation) and a fix for https://github.com/pandas-dev/pandas/issues/24562 will open up freq validation in DatetimeArray.__init__
I think [the OP example, not the others] was fixed by a semi-recent PR that implemented DTI/TDI.where and always sets the resulting freq to None.
What's the expected output here?
The returned DatetimeIndex doesn't pass freq validation.
Should the freq be None?