pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

"groups" attribute has unexpected value for TimeGrouper groupby or resample #13152

Open shoyer opened 8 years ago

shoyer commented 8 years ago

They provide scalars are the values rather all the corresponding labels:

In [81]: index = pd.date_range('2000-01-01', periods=5)

In [82]: s = pd.Series(np.arange(index.size), index)

In [83]: s
Out[83]:
2000-01-01    0
2000-01-02    1
2000-01-03    2
2000-01-04    3
2000-01-05    4
Freq: D, dtype: int64

In [84]: g = s.groupby(pd.Grouper(freq='2D'))

In [85]: g.groups
Out[85]:
{Timestamp('2000-01-01 00:00:00', offset='2D'): 2,
 Timestamp('2000-01-03 00:00:00', offset='2D'): 4,
 Timestamp('2000-01-05 00:00:00', offset='2D'): 5}

Per the groupby docs, I expected something like the following instead:

In [85]: g.groups
Out[85]:
{Timestamp('2000-01-01 00:00:00', offset='2D'):
 [Timestamp('2000-01-01 00:00:00', offset='D'),
  Timestamp('2000-01-02 00:00:00', offset='D')],
 Timestamp('2000-01-03 00:00:00', offset='2D'):
 [Timestamp('2000-01-03 00:00:00', offset='D'),
  Timestamp('2000-01-04 00:00:00', offset='D')],
 Timestamp('2000-01-05 00:00:00', offset='2D'):
 [Timestamp('2000-01-05 00:00:00', offset='D')]}

We see the same behavior for resample.groups as well:

In [103]: s.resample('2D').groups
Out[103]:
{Timestamp('2000-01-01 00:00:00', offset='2D'): 2,
 Timestamp('2000-01-03 00:00:00', offset='2D'): 4,
 Timestamp('2000-01-05 00:00:00', offset='2D'): 5}
jreback commented 8 years ago

Actually I recall this a bit see here: https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L1974

BinGrouper (which is used in the time-based groupbys) exposes it differently.

I don't think this is actually used anywhere. As even though .groups is public it internally is just exposed to the user.

Ok if you'd like to see what the problem / fix would be great.