rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.32k stars 886 forks source link

[FEA] Implement center for offset based windows #15086

Open amanlai opened 7 months ago

amanlai commented 7 months ago

Suppose I have a Series as follows:

s = cudf.Series(range(100), index=cudf.date_range('2024', periods=100, freq='D'))

If I want to perform 3-day rolling window mean, I can do:

window_size = 3
s.rolling(f'{window_size}D').mean()

This is not centered. If I want to set the window labels as the center of the window index (like in pandas):

a = s.rolling(f'{window_size}D', center=True).mean()

then I get a NotImplementedError.

I wish I could do this in cudf.

Right now, I can just compute the rolling mean, shift it by half the window size and fill in the NaN values by using a loop over a variable window but it's a little ugly.

shift = -(window_size-1)//2
b = s.rolling(f'{window_size}D').mean().shift(shift)
b.iloc[shift:] = [s.loc[i:].mean() for i in s.index[-window_size+1:-window_size-shift+1]]

Pandas' DataFrame.rolling uses a cython optimized function to implement center (and closed) parameters. Its function to get variable window indexers is pandas._libs.window.indexers.calculate_variable_window_bounds. My suggestion is to implement this function in cudf.

shwina commented 7 months ago

Thanks for reporting. I'm currently working on a refactor of our rolling Python implementation and I'll see if I can include this there!

wence- commented 2 months ago

Also #14334.