pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.08k stars 1.83k forks source link

exponentially weighted moving sum (ewms) #17602

Open celsius38 opened 1 month ago

celsius38 commented 1 month ago

Description

Also here: https://stackoverflow.com/questions/78741144/is-there-a-exponentially-weighted-moving-sum-ewms-instead-of-ewma-function-in

In polars ewm_mean, the update is formulated as:

y = (1 - alpha) * y + alpha * x

I found that sometimes it is useful to use non-decayed update, a.k.a. exponentially weighted moving sum instead:

y = (1 - alpha) * y + x

ewms is merely a simple multiple (1 / alpha) of ewma for equal time lapse but not so trivial when updates are not equally spaced

To achieve this I implement some update function on myself and try to use .map_batches() but think it's rather suboptimal.

Is there a better alternative?

def ems_iter(values: np.array, intervals: Union[np.array, float], half_lifes: Union[np.array, float]):
    """
    >>> ems_iter([1, 0, 0, 1, 0, 0], 1, 1)
    [1.0, 0.5, 0.25, 1.125, 0.5625, 0.28125]
    """
    res = []
    agg = 0.
    values = np.array(values)
    decays = (1/2) ** (np.array(intervals) / half_lifes)
    decays = np.ones(values.shape) * decays
    for value, decay in zip(values, decays):
        agg = agg * decay + value
        res.append(agg)
    return res

def ems(val_col: str, interval_col: str, hl: int) -> pl.Expr:
    return (
        pl.struct([col, interval_col])
        .map_batches(
            lambda x: pl.Series(
                ems_iter(
                    x.struct.field(col),
                    x.struct.field('lapse').dt.total_seconds(),
                    hl
                )
            )
        )
    )
MarcoGorelli commented 1 month ago

hi @celsius38

is there a reference for this function? is an adjust parameter necessary, like it is in ewm_mean?

celsius38 commented 1 month ago

@MarcoGorelli Couldn't find an existing implementation in any package but it is frequently used in quantitative trading, for example, impact models, etc.

The difference between ema and ems: ema will not change the mean for a stationary distribution, but ems will (and bigger for longer half-life)

I think it should be exactly the same as ema except for that when doing the update, you don't multiply the update by 'alpha'

MarcoGorelli commented 1 month ago

thanks for your response

even without reference software is there any reference textbook / paper that includes the formula? If someone asks about it in the future, it would be good to have something to point to

I think it should be exactly the same as ema except for that when doing the update, you don't multiply the update by 'alpha'

for the unadjusted version, right? because there's two version for ewm_mean right now: https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.ewm_mean.html

it looks like the formula you've provided is analogous to the ewm_mean unadjusted formula