tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
564 stars 111 forks source link

step_lag() add a datetime index parameter so that lag is not carried over gaps in non continuous time series. #860

Closed jacekkotowski closed 3 months ago

jacekkotowski commented 2 years ago

Feature

In situations when time series data is not continuous, e.g. in bicycle bike sharing competition by Kaggle https://www.kaggle.com/c/bike-sharing-demand/ it would be useful to have the possibility to prevent step_lag() for jumping over gaps in time series. In the case of the competition, data is missing each month after the 20th day.

e.g. step_lag(atemp, lag = 2 index_col = datetime)

There is a gap in time series (after 20th day of January, NA is returned

datetime, temp, lag_temp 2019-01-17,12, NA
2019-01-18,12, 12 2019-01-19,11, 12 2019-01-20,10, 11 2019-02-01,11, NA 2019-02-02,12, 11 2019-02-03,13, 12

juliasilge commented 2 years ago

Can you create a reprex (a minimal reproducible example) for this feature request? The goal of a reprex is to make it easier for us to recreate your situation so that we can understand and evaluate it. Rather than downloading a large dataset from Kaggle, use some built-in or in-lined data and try to clearly outline what behavior you are seeing now vs. what you need for your use case.

If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with:

install.packages("reprex")

Thanks! 🙌

EmilHvitfeldt commented 3 months ago

I'm closing this for inactivity. If this functionality is still requested, please file another issue!

github-actions[bot] commented 3 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.