Open jackaixin opened 3 months ago
a generic rolling+over syntax that allows chained expressions can be super handy.
DataFrame.rolling doesn't handle business day (very commonly used in financial datasets) well so I had to manually add an index to the original dataframe, which is a bit inconvenient
Can't agree more
Hopeful the speed for kurtosis could be picked up too, not sure what is happening behind the scenes.
DataFrame.rolling doesn't handle business day (very commonly used in financial datasets) well
@MarcoGorelli Can't do this with the businessday expressions?
I think we can do this 😎 add_business_days
/ business_day_count
have been in for a few months without complaints (as far I've seen) so I think it's time to bring them to some other methods which accept time offsets
I think, python_holiday is not flexible and reliable enough. As far as I know, they only include NYSE. What if some people want financial holidays from other market, such as London or JPX?
it's up to the user to provide the list of holidays, it's up to you if you if you use python_holiday or whatever other calendar 📆
Description
I was testing a few rolling group_by functions and it appears that
rolling_skew
in particular is very slow compared to the implementation in pandas.Here's the dataframe I used for testing:
As a baseline, I used pandas and checked its performance:
However
rolling_skew
in polars is 4x slower:Also, I see all the other rolling functions have a
min_periods
argument in their signatures, which is not the case forrolling_skew
. This is a bit inconsistent and I wonder whether there's a particular reason for this.In addition, this might also be a good opportunity to pick up the stale
rolling_kurtosis
feature request. https://github.com/pola-rs/polars/issues/4235As a side note, I see in this issue https://github.com/pola-rs/polars/issues/2974 that @ritchie46 mentioned one can use
groupby_rolling
(nowDataframe.rolling
) in combination withkurtosis
expression to achieve rolling_kurtosis. I also gave that a try, but the performance was also slower than pandas andDataFrame.rolling
doesn't handle business day (very commonly used in financial datasets) well so I had to manually add an index to the original dataframe, which is a bit inconvenient (and probably not idiomatic either)?Something like
Expr.rolling_kurtosis
which returns anExpr
is much preferred than the hackyDataFrame.rolling
implementation.Expr.rolling_
also supports rolling window that depends only on the number of rows, which is similar to pandas. In fact, as this issue (https://github.com/pola-rs/polars/issues/12051) points out, a generic rolling+over syntax that allows chained expressions can be super handy.Below are the two implementations I tried (on the same dataset above) which took similar amount of time.