pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.75k stars 17.96k forks source link

ENH: Pandas ewm with fixed window / offset #42499

Open nrcjea001 opened 3 years ago

nrcjea001 commented 3 years ago

Problem & Proposed feature Pandas ewm function works similar to the pandas expanding function in that it rolls over the whole dataframe. In my case however, I need to specify a fixed window or offset over which the ewm function is applied. In other words, an ewm function with a cutoff that would work similar to the 'window' parameter in the pandas rolling function.

Alternatives considered My workaround to the problem is as follows (however extremely inefficient):

df = pd.DataFrame({'a': np.random.randint(5, size=24),
                   'b': ["S", "A"] * 12,
                   'c': pd.date_range(start='1/1/2018', end='12/12/2018', freq='15D')})

df.groupby('b').rolling('60d', on='c')['a'].apply(lambda x: x.ewm(halflife='15d', times=x.index).mean().tail(1))

In the example above, I am trying to do an exponentially-weighted moving average, where decay is specified in terms of halflife on a datetime column. The rolling apply ensures that the maximum offset is '60d' which is the feature I propose to be added directly in the ewm function.

So in summary, I propose a window parameter to be introduced to the ewm function which limits the ewm calculation to a fixed window or offset.

FitzHoo commented 2 years ago

When will the feature be added to ewm function, please?

jreback commented 2 years ago

pandas is all volunteer - community PRs are how things get added

this issue likely needs specification of an efficient way of doing this first

jmg-duarte commented 2 years ago

I've implemented a similar feature.

The use case was similar: we need to calculate the EWMA in a "forgetful" manner, where only the values in the window are taken into account.

We perform the calculations in an incremental manner and adapted the formula from adjust=True by removing the weight of the "dropped" value (i.e. the value being removed).

yongqiang-zhao commented 11 months ago

@jmg-duarte Hi, could you kindly please provide details about how to use the feature? Really appreciate your help.

jmg-duarte commented 11 months ago

@yongqiang-zhao sadly I've since moved on to another company and no longer have access to that code.

Best I can do now is provide some resources:

I hope this helps!

yongqiang-zhao commented 11 months ago

@jmg-duarte Thank you very much for your kind help.