Closed stevenlis closed 1 year ago
As ddof arg is not supported at the moment for rolling_std(), alternative method for short dfs:
rolling_apply(lambda s: s.std(ddof=0), window_size=3)
Hi @ritchie46,
It will be very good to get support for ddof (= 0) in rolling_std().
While the polars code mentions of statistics primarily using only ddof = 1, there is need for ddof = 0 while computing technical indicators (eg bollinger bands) for financial instruments. The values of panda's std(ddof = 0) matches that of TA-Lib, tradingview.com, webull, amibroker package and other prominent sources.
Note that Series.std(ddof = 0) works fine (with rolling_apply(), but it's too slow to use).
Yeap, I think this is easy to add. Probably just need to churn through some dispatch layers
I'm taking a look at this and have some questions. I don't want to commit a style or Rust faux-pas.
To start, I'm looking at the no_nulls
version of rolling_var
.
Depending on if it has weights or not, it's using either rolling_apply_weights
or rolling_apply_agg_window
. The two functions look very similar, but other than the weights rolling_apply_agg_window
handles the window using an object that implements the trait RollingAggWindowNoNulls
, which seems to be used to avoid recomputing everything in every window.
Ideally I'd like to use the trait version but that would require adding a ddof
argument to new
which only makes sense for the variance. I'll keep looking for something sane but if anyone has a suggestion off the top of their head that doesn't require awkward hacks or writing almost the same function again for this case that would be helpful. Thanks.
We could maybe accept a &dyn Any
in the new for specific constructor arguments in new. The implementation then knows how to downcast that &dyn Any
. For the variance
that would be u8
, being the ddof
.
Other implementation can ignore that extra argument. And if we need more arguments in the future we can pass a struct.
I have a draft PR here https://github.com/pola-rs/polars/pull/8957 I ended up doing something similar but no matter what it seems like there are going to have to be a lot of parameters passed all over the place. It seems like there are a lot of opportunities for simplification and deduplication in a lot of the rolling code so maybe that's something to talk about. As for these parameters, things might end up simpler if everything were just kept in that original options struct and never unpacked until the actual function got called. With this setup, it might also be possible to not have to special-case quantiles anymore either.
Also, is there a reason that rolling_std
doesn't just compute a rolling variance and then take a square root of the result as opposed to having its own window type and taking a square root at every iteration like it does now?
Could someone take a look and tell me if the general structure of the changes looks OK? I know I need to fix a few things, look into numerical precision issues, and add some more tests, but before going ahead I want to make sure the way I have it passing new parameters around is acceptable.
@ritchie46 should this be closed?
Problem description
https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.rolling_std.html#polars.Expr.rolling_std https://numpy.org/doc/stable/reference/generated/numpy.std.html
It might be a good idea to add a
ddof
inrolling_std
. If not, it might be a good idea to addddof=1
in the doc: