ddof in polars.Expr.rolling_std

pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

https://docs.pola.rs

Other

29.07k stars 1.83k forks source link

ddof in polars.Expr.rolling_std #7999

Closed stevenlis closed 1 year ago

stevenlis commented 1 year ago

Problem description

https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.rolling_std.html#polars.Expr.rolling_std https://numpy.org/doc/stable/reference/generated/numpy.std.html

It might be a good idea to add a ddof in rolling_std. If not, it might be a good idea to add ddof=1 in the doc:

import numpy as np
import polars as pl

data = [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]

df = pl.DataFrame({"A": data})
print(
    df.select(
        pl.col("A").rolling_std(window_size=3).round(2)
    )
)

rolling = [data[i:i + 3] for i in range(len(data) - 2)]
for r in rolling:
    print('ddof=0', np.std(r).round(2))
for r in rolling:
    print('ddof=1', np.std(r, ddof=1).round(2))

shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
│ null │
│ 1.0  │
│ 1.0  │
│ 1.53 │
│ 2.0  │
└──────┘
ddof=0 0.82
ddof=0 0.82
ddof=0 1.25
ddof=0 1.63
ddof=1 1.0
ddof=1 1.0
ddof=1 1.53
ddof=1 2.0

collinsethans commented 1 year ago

As ddof arg is not supported at the moment for rolling_std(), alternative method for short dfs:

 rolling_apply(lambda s: s.std(ddof=0), window_size=3)

collinsethans commented 1 year ago

Hi @ritchie46,

It will be very good to get support for ddof (= 0) in rolling_std().

While the polars code mentions of statistics primarily using only ddof = 1, there is need for ddof = 0 while computing technical indicators (eg bollinger bands) for financial instruments. The values of panda's std(ddof = 0) matches that of TA-Lib, tradingview.com, webull, amibroker package and other prominent sources.

Note that Series.std(ddof = 0) works fine (with rolling_apply(), but it's too slow to use).

ritchie46 commented 1 year ago

Yeap, I think this is easy to add. Probably just need to churn through some dispatch layers

magarick commented 1 year ago

I'm taking a look at this and have some questions. I don't want to commit a style or Rust faux-pas.

To start, I'm looking at the no_nulls version of rolling_var. Depending on if it has weights or not, it's using either rolling_apply_weights or rolling_apply_agg_window. The two functions look very similar, but other than the weights rolling_apply_agg_window handles the window using an object that implements the trait RollingAggWindowNoNulls, which seems to be used to avoid recomputing everything in every window.

Ideally I'd like to use the trait version but that would require adding a ddof argument to new which only makes sense for the variance. I'll keep looking for something sane but if anyone has a suggestion off the top of their head that doesn't require awkward hacks or writing almost the same function again for this case that would be helpful. Thanks.

ritchie46 commented 1 year ago

We could maybe accept a &dyn Any in the new for specific constructor arguments in new. The implementation then knows how to downcast that &dyn Any. For the variance that would be u8, being the ddof.

Other implementation can ignore that extra argument. And if we need more arguments in the future we can pass a struct.

magarick commented 1 year ago

I have a draft PR here https://github.com/pola-rs/polars/pull/8957 I ended up doing something similar but no matter what it seems like there are going to have to be a lot of parameters passed all over the place. It seems like there are a lot of opportunities for simplification and deduplication in a lot of the rolling code so maybe that's something to talk about. As for these parameters, things might end up simpler if everything were just kept in that original options struct and never unpacked until the actual function got called. With this setup, it might also be possible to not have to special-case quantiles anymore either.

Also, is there a reason that rolling_std doesn't just compute a rolling variance and then take a square root of the result as opposed to having its own window type and taking a square root at every iteration like it does now?

magarick commented 1 year ago

Could someone take a look and tell me if the general structure of the changes looks OK? I know I need to fix a few things, look into numerical precision issues, and add some more tests, but before going ahead I want to make sure the way I have it passing new parameters around is acceptable.

stevenlis commented 1 year ago

@ritchie46 should this be closed?