pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.65k stars 17.58k forks source link

BUG: Date objects cannot be compared against a DatetimeIndex #35466

Open knabben opened 3 years ago

knabben commented 3 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Version 1.0.5

holidays = USFederalHolidayCalendar().holidays()
>>> datetime.datetime(2018, 7, 4) in holidays
True
>>> datetime.date(2018, 7, 4) in holidays
True
>>> pd.__version__
'1.0.5'

Version 1.1.0

holidays = USFederalHolidayCalendar().holidays()
>>> datetime.datetime(2018, 7, 4) in holidays
True
>>> datetime.date(2018, 7, 4) in holidays
False
>>> pd.__version__
'1.1.0'
>>>

Problem description

On version 1.0.5 it was possible to compare via contains a datetime.date object against a DatetimeIndex, this behavior is not true anymore for the 1.1.0, is this an expected behavior to allow only recognized scalars objects?

Expected Output

Allow dates to be compared with 00:00:00 time.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : d9fff2792bf16178d4e450fe7384244e50635733 python : 3.6 OS : Darwin pandas : 1.1.0 numpy : 1.19.1 Cython : None
simonjayhawkins commented 3 years ago

Thanks @knabben for the report

On version 1.0.5 it was possible to compare via contains a datetime.date object against a DatetimeIndex, this behavior is not true anymore for the 1.1.0, is this an expected behavior to allow only recognized scalars objects?

31023 is the cause of this regression, so doesn't appear to be an intentional change cc @jbrockmendel

9b0ef5d07fb218df4e36e133d69b1ea4c6be43bd is the first bad commit commit 9b0ef5d07fb218df4e36e133d69b1ea4c6be43bd Author: jbrockmendel jbrockmendel@gmail.com Date: Tue Jan 14 19:10:24 2020 -0800

refactor DTI.get_loc (#31023)
jreback commented 3 years ago

this needs more discussion

dwardzinski-jsf commented 3 years ago

I hope this gets more attention. This also broke DataFrame/Series indexing:

>>> date = datetime.date(2000, 1, 1)
>>> s = pd.Series([1], index=pd.to_datetime([date]))
>>> s.loc['2000-01-01']
1
>>> s.loc[date]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dwardzinski/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/dwardzinski/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/dwardzinski/.local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1059, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/dwardzinski/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 3482, in xs
    loc = self.index.get_loc(key)
  File "/home/dwardzinski/.local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 622, in get_loc
    raise KeyError(key)
KeyError: datetime.date(2000, 1, 1)

This broke a lot of stuff for me, since my software relies pretty heavily on indexing with date objects. Also, #35478 (@knabben's PR) doesn't appear to fix this issue unfortunately.

jreback commented 3 years ago

there is not a PR to actually change this, nor has this been discussed, moving off 1.2

jbrockmendel commented 3 years ago

The current behavior of DatetimeIndex refusing to compare to a date object is correct, and we need to change the Timestamp behavior to match it #36131 along with the stdlib.

simonjayhawkins commented 3 years ago

The current behavior of DatetimeIndex refusing to compare to a date object is correct

should this behaviour be deprecated first?

and for the indexing case, https://github.com/pandas-dev/pandas/issues/35466#issuecomment-678407125 (maybe need to create separate issue), is KeyError appropriate or would TypeError now be more appropriate.

to be consistent with Python list indexing...

>>> [1,2,3]["a"]
<stdin>:1: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str
>>>

perhaps something like TypeError: DatetimeIndex indices must be a time-like string, pd.Timestamp, datetime.datetime or an array of those or slices, not datetime.date

jbrockmendel commented 3 years ago

KeyError appropriate or would TypeError now be more appropriate.

In 1.1.0 we made it so that failed label-based lookups always raise KeyError (see _whatsnew_110.notable_bug_fixes.indexing_raises_key_errors)

sam-s commented 1 month ago

So, what are the plans wrt this? This is a regression bug that seems to have been tenured into a feature.