steve-the-bayesian / BOOM

A C++ library for Bayesian modeling, mainly through Markov chain Monte Carlo, but with a few other methods supported. BOOM = "Bayesian Object Oriented Modeling". It is also the sound your computer makes when it crashes.
GNU Lesser General Public License v2.1
35 stars 14 forks source link

Off by one error in seasonal regression models? #63

Closed steve-the-bayesian closed 2 years ago

steve-the-bayesian commented 2 years ago

I am a forecasting practitioner who has been trying to use your bsts package to solve some forecasting tasks. The data I play around to learn how to use bsts is the M5 data (daily sales in 1969 days) and bizarrely when I include a dummy variable to model the Christmas effect, the model suddenly fails to capture the weekly seasonal pattern. Specifically, the data is like this

read data from csv

unit_sales <- readr::read_csv('M5example.csv', col_types = cols()) unit_sales$date <- as.Date.character(unit_sales$date) image.png image.png

This is how I split the dataset

h <- 56 len_his <- dim(unit_sales)[1] - h data_train <- head(unit_sales, len_his) data_test <- tail(unit_sales, h) y_fit <- zoo(data_train$sales, order.by = data_train$date) y_true <- zoo(data_test$sales, order.by = data_test$date)

This is how I build the first model

ss <- AddSemilocalLinearTrend(list(), y_fit) Screen Shot 2022-04-07 at 9 07 53 AM Screen Shot 2022-04-07 at 9 08 19 AM

ss <- AddSeasonal(ss, y_fit, nseasons = 7) model <- bsts(y_fit, state.specification = ss, niter = 1000, seed=100) y_pred <- predict(model, horizon=h)

The model correctly capture the weekly seasonal pattern

image.png

I then tried to build a second model that includes the Christmas regressor

ss2 <- AddSemilocalLinearTrend(list(), data_train$sales) ss2 <- AddSeasonal(ss2, data_train$sales, nseasons = 7) model2 <- bsts(sales~., state.specification = ss2, data = data_train[, -c(1)], niter = 1000, seed = 100) y_pred2 <- predict(model2, newdata = data_test[, -c(1)] ,horizon=h)

It seems that there is a one-day shift between the true seasonal pattern and the forecasted one.

image.png

Please see attached the csv for data and the ipynb file for reproducible code.

I find it difficult to understand why including one regressor makes the model fail to capture the seasonal pattern. Is there anything I do wrong when building the second model? If not, what is the reason to cause the model to behave like this?

Thanks a lot for your time. I am looking forward to your reply. M5example.csv

steve-the-bayesian commented 2 years ago

The version I used was 0.9.2. I've updated it to 0.9.6 and suddenly the problem is gone. Now the seasonal pattern is correctly forecasted.