pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

BUG: partial slicing with a Timestamp on PeriodIndex #15920

Open ocschwar opened 7 years ago

ocschwar commented 7 years ago

An example: a PeriodicIndex with a freq of 300S. First second works. Remaining portion of the interval raises a KeyError.

>>> DR = pd.period_range(datetime.datetime.now(), freq='300S',periods=22)
>>> S = pd.Series( [0.0]*22,index=DR) 
>>> now = datetime.datetime.now()
>>> S[now]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 603, in __getitem__
    result = self.index.get_value(self, key)
  File "/Library/Python/2.7/site-packages/pandas/tseries/period.py", line 757, in get_value
    return com._maybe_box(self, self._engine.get_value(s, key),
  File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
  File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8564)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8508)
KeyError: 1491422454
>>> S.index[0]
Period('2017-04-05 20:00:23', '300S')
>>> S[datetime.datetime(2017,4,5,20,0,23)]
0.0
>>> S[datetime.datetime(2017,4,5,20,0,23,999)]
0.0
>>> DR = pd.period_range(datetime.datetime.now(), freq='5T',periods=22)
>>> S = pd.Series( [0.0]*22,index=DR)
>>> now = datetime.datetime.now()
>>> S[now]
0.0
>>> S.index[0]
Period('2017-04-05 20:03', '5T')
>>> S[datetime.datetime(2017,4,5,20,3,23,999)]
0.0
>>> S[datetime.datetime(2017,4,5,20,4,23,999)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 603, in __getitem__
    result = self.index.get_value(self, key)
  File "/Library/Python/2.7/site-packages/pandas/tseries/period.py", line 757, in get_value
    return com._maybe_box(self, self._engine.get_value(s, key),
  File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
  File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8564)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8508)
KeyError: 24857044
pandas.version u'0.19.2'
jreback commented 7 years ago

can you make this example not dependent on datetime.datetime.now(). Your example is not copy-pastable.

ocschwar commented 7 years ago
m>>> import pandas, datetime
>>> import pandas as pd
>>> DR = pd.period_range(datetime.datetime.(2017,1,1), freq='5T',periods=22)
  File "<stdin>", line 1
    DR = pd.period_range(datetime.datetime.(2017,1,1), freq='5T',periods=22)
                                           ^
SyntaxError: invalid syntax
>>> DR = pd.period_range(datetime.datetime(2017,1,1), freq='5T',periods=22)
>>> S = pd.Series( [0.0]*22,index=DR) 
>>> S[datetime.datetime(2017,1,1,0,0)]
0.0
>>> S[datetime.datetime(2017,1,1,0,1)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 603, in __getitem__
    result = self.index.get_value(self, key)
  File "/Library/Python/2.7/site-packages/pandas/tseries/period.py", line 757, in get_value
    return com._maybe_box(self, self._engine.get_value(s, key),
  File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
  File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8564)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8508)
KeyError: 24720481
>>> S[datetime.datetime(2017,1,1,0,0,30)]
0.0
>>> 
ocschwar commented 7 years ago

Or:

import datetime
import pandas as pd
DR = pd.period_range(datetime.datetime(2017,1,1), freq='5T',periods=22)
S = pd.Series( [0.0]*22,index=DR) 
S[datetime.datetime(2017,1,1,0,0)]
S[datetime.datetime(2017,1,1,0,1)]
S[datetime.datetime(2017,1,1,0,0,30)]
jreback commented 7 years ago

yeah looks buggy. Period partial slicing is not fully developed. pull-requests are welcome.

jreback commented 7 years ago

This would need tests with datetimes & string slicing.

jreback commented 7 years ago

xref to #13429 (different though no PI).

ocschwar commented 7 years ago

If I understand the stack traces correctly, the bug is somewhere in pandas.core.common, right?

jreback commented 7 years ago

this is a little complicated, but you can have a look at: https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/period.py#L725

step thru with a successful match and then the unsucceful one.

ocschwar commented 7 years ago

I ran a git pull, and can confirm that the bug remains in the master branch.

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/oschwarz/Desktop/git/pandas/pandas/core/series.py", line 597, in __getitem__ result = self.index.get_value(self, key) File "/Users/oschwarz/Desktop/git/pandas/pandas/tseries/period.py", line 769, in get_value return com._maybe_box(self, self._engine.get_value(s, key), File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4363) cpdef get_value(self, ndarray arr, object key, object tz=None): File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4046) loc = self.get_loc(key) File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085) return self.mapping.get_item(val) File "pandas/_libs/hashtable_class_helper.pxi", line 756, in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13913) cpdef get_item(self, int64_t val): File "pandas/_libs/hashtable_class_helper.pxi", line 762, in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13857) raise KeyError(val) KeyError: 24720481 And I'm not quite able to wrap my head around the steps that get skipped in the stack trace. Also, I'm wondering, what is the reason for grouping frequencies in the Period and PeriodIndex objects instead of just converting them to a time delta and storing them as such?

jreback commented 7 years ago

@ocschwar this is an open bug, not sure why you would think its fixed.