Open beckernick opened 2 years ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Took a quick look at this issue again. I suspect that the reason this occurs is due to the __getitem__
call (.x
). RollingGroupby inherits from Rolling, which has the getitem implementation.
This implementation returns a standard Rolling object:
So the call to mean()
goes down the standard Rolling._apply_agg
path rather than the RollingGroupby._apply_agg
. This path doesn't create a MultiIndex:
Unlike the RollingGroupby._apply_agg
path:
To make this concrete:
import cudf
df = cudf.datasets.randomdata()
print("getitem call on the RollingGroupby object")
print(df.groupby(["id"]).rolling(window=3).x.mean().head())
print("getitem call before the RollingGroupby is created")
print(df.groupby(["id"]).x.rolling(window=3).mean().head())
getitem call on the RollingGroupby object
3 <NA>
8 <NA>
6 <NA>
7 <NA>
9 <NA>
Name: x, dtype: float64
getitem call before the RollingGroupby is created
id
918 3 <NA>
963 8 <NA>
967 6 <NA>
990 7 <NA>
9 <NA>
Name: x, dtype: float64
Ran into this today while using cudf.pandas.
cc @galipremsagar @vyasr , would this be a good first issue for a new contributor?
Yes, I think so!
Python groupby rolling aggregations return a single Index that corresponds to the original row position of the element, but in pandas return a MultiIndex that includes both the groupby key(s) and original row position.
This is not currently blocking any behavior with Dask + cuDF, as grouped rolling operations are blocked by #10173