Closed mhconradt closed 2 years ago
Would be nice to have alias ewma
for ewm_mean
.
@ritchie46 this would be huge for quants, probably the biggest the project is missing in that realm at this point.
Might be better to use ew_
prefix as opposed to ewm_
.
@ritchie46 this would be huge for quants, probably the biggest the project is missing in that realm at this point.
Then let's do it! :)
I would implement this as a kernel
in https://github.com/pola-rs/polars/tree/master/polars/polars-arrow/src/kernels. And then we could dispatch an expression as:
let ca = column.f64()?.rechunk()?.downcast_iter().next().unwrap();
ca.apply_kernel(ewm_mean)?.into_series()
But given that for E[i]
we must compute all values before E[i]
, this scales quadratically, or is there a clever trick?
But given that for E[i] we must compute all values before E[i], this scales quadratically, or is there a clever trick?
I assume you only do this for the window size?
R[2] = 0.5 4 + 0.5 R[1] = 3.75
Should be R[2] = 0.5 * 3 + 0.5 * R[1] = 3.25
?
Agreed that this is one of the most important missing features for quants.
One thing I do wonder, are there no Rust implementations yet you could wrap around? Because a fast (streaming, no need for quadratic scaling) implementation that deals with nulls and fixed window sizes, possibly operate on a separate time axis, can do means, variances, covariances, correlations, etc is quite some work, and not specific to a dataframe library.
One thing I do wonder, are there no Rust implementations yet you could wrap around? Because a fast (streaming, no need for quadratic scaling) implementation that deals with nulls and fixed window sizes, possibly operate on a separate time axis, can do means, variances, covariances, correlations, etc is quite some work, and not specific to a dataframe library.
I already have something. :) (At least only for the means. :see_no_evil: )
@ritchie46 I looked at the implementation and it's pretty clever, bravo! It would be good to add min_periods
and ignore_na
if we can do that without ruining the secret scaling sauce. I estimate ew_mean
will get used 100x as much as the other ew_
functions so it's the most important to get right.
Yes, I will add those options as well. And I found that the ewm_std and ewm_var can be easily computed from the ewma, so I will also add those.
Describe your feature request
Many applications in quantitative finance use an exponentially-weighted moving average (called EWMA henceforth) as a smoothing mechanism that gives higher weight to more recent data. I propose adding a suite of functions to Polars along the lines of
rolling_
functions that supports EWMA asewm_mean
and in the future could support other metrics such asewm_std
,ewm_var
,ewm_corr
,ewm_cov
, etc. There are a few orthogonal parameters involved here:alpha
,com
(center of mass),span
,halflife
, etc.min_periods
?alpha=0.5
of values 2, 5, and 3 in that order.E[0] = 2 / (1 - 0.5) ^ 0 = 2 E[1] = 5 / (1 + (1 - 0.5) ^ 1) + 2 (1 - 0.5) ^ 1 / (1 + (1 - 0.5) ^ 1) = 4 E[2] = (3 + 5 (1 - 0.5) ^ 1 + 2 * (1 - 0.5) ^ 2) / (1 + (1 - 0.5) ^ 1 + (1 - 0.5) ^ 2) = 3.428571
R[0] = 2 R[1] = 0.5 5 + 0.5 R[0] = 3.5 R[2] = 0.5 4 + 0.5 R[1] = 3.75
Note abs(R[1] - E[1])> abs(R[2] - E[2]), the two converge over time. Good news is there's a way to compute E recursively for the EWMA that's not too bad. Bad news is it probably does not work for
ewm_std
,ewm_corr
, etc.ignore_nulls
.Relevant Pandas source code (fasten your seatbelts, it's Cython). Note it does not use a recursive definition, but rather appears to trim the window before calling the function based on the rate of decay.
(Edited example to show usage as window function (important that calculation carried out independently for each market))