Open jjfantini opened 7 months ago
Thanks for the issue - to expedite resolution could you show an example of what you'd like to do with expected output please?
Yes I can do that :)
Personally I am using this to add a column to a pl.DataFrame
where I have a custom function _annual_vol
that needs to compute the rolling volatility for every month.
So here is a use case for an internal function rolling_std()
:
import numpy as np
import datetime as dt
trading_periods = (252,)
_column_name_returns: str = "log_returns"
dates = pl.Series(
[
dt.datetime(2021, 1, 29),
dt.datetime(2021, 1, 30),
dt.datetime(2021, 1, 31),
dt.datetime(2021, 2, 1),
dt.datetime(2021, 2, 2),
dt.datetime(2021, 2, 3),
dt.datetime(2021, 2, 4),
dt.datetime(2021, 2, 5),
dt.datetime(2021, 2, 8),
dt.datetime(2021, 2, 9),
]
)
data = pl.DataFrame(
{
"log_returns": [2, 4, 6, 5, 3, 7, 2, 8, 4, 5],
"date": dates
}
)
vol = data.set_sorted("date").select(
pl.col(_column_name_returns).rolling_std(
window_size=3, min_periods=1, by="date"
)
* np.sqrt(trading_periods)
)
Here is a similar function, but I cannot use window_size="2d"
to specify a width of 2 days. I have to use an integer. When the dataset becomes larger and I would like to use "1m"
I cannot set it to just 21, becuase that can change from month to month.
import numpy as np
import datetime as dt
def annual_vol(data: pl.Series, trading_periods: int = 252) -> pl.Series:
return (trading_periods * data.mean()) ** 0.5
trading_periods = (252,)
_column_name_returns: str = "log_returns"
dates = pl.Series(
[
dt.datetime(2021, 1, 29),
dt.datetime(2021, 1, 30),
dt.datetime(2021, 1, 31),
dt.datetime(2021, 2, 1),
dt.datetime(2021, 2, 2),
dt.datetime(2021, 2, 3),
dt.datetime(2021, 2, 4),
dt.datetime(2021, 2, 5),
dt.datetime(2021, 2, 8),
dt.datetime(2021, 2, 9),
]
)
data = pl.DataFrame(
{"log_returns": [2, 4, 6, 5, 3, 7, 2, 8, 4, 5], "date": dates}
)
vol = data.set_sorted("date").select(
pl.col(_column_name_returns).rolling_map(annual_vol,
window_size="2d", min_periods=1
)
* np.sqrt(trading_periods)
)
You can get similar functionaility in Polars
using the .rolling()
& .map_elements()
functions:
vol = data.set_sorted("date").rolling(index_column="date", period="2d").agg(
pl.col("log_returns").map_elements(annual_vol)
)
BUT, I think that this should be integrated into the .rolling_map
function, since it seems redundant to have both avialable and one lacking a feature of the other?
There should be a clarification on using .rolling()
that a timedelta parameter for period
will only compute consecutive date agregations. If there is a weekend skipped and the date is not avail. in the data, whn using the rolling().agg()
logic, the date prior is not included in the calculation. This should be included, or let the user decide.
Basically, rolling_map()
should copy the functionality of rolling_*
polars functions and allow window_size
to be timedelta
or str
. :)
Description
Improvement
The
window_size
parameter should acceptstr
andtimedelta
types.There are times when a user needs to apply a custom function to a rolling window. The beauty of the
rolling_*
functions is that you can roll the date dynamically via the specified string language in otherrolling_*
functions. This is useful for financial time series where you have unequal date intervals, but want to roll over"1m"
of data, and using an integer value does not suffice (due to changing window sizes).In pandas, you can use
.rolling("1m").apply(<func>)
which will dynamically roll the function over a shifting window. In Polars, you can use:Design:
by=<col_name>
- this ensures there is a date column that can be grouped.