google / CausalImpact

An R package for causal inference in time series
Apache License 2.0
1.71k stars 254 forks source link

predict.bsts does not capture holiday information #27

Open simonwhelan opened 5 years ago

simonwhelan commented 5 years ago

Hi!

I've been working with bsts to model time-series and make forecasts, but have run into some problems with the predict function and holidays.

When I train the model with regressor holidays these regressors do not appear to be used in the predict function. For instance in a simple local level model the holiday regressor causes the response variable to go down, but the predictor of the response over the same period does not capture this behaviour.

I attach below a minimum working example based on the regression.holiday code from the documentation:

library(bsts)
library(ggplot2)
set.seed(12345)

trend <- cumsum(rnorm(1095, 0, .1))
  dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day")
  y <- zoo(trend + rnorm(length(trend), 0, .2), dates)

AddHolidayEffect <- function(y, dates, effect) {
  ## Adds a holiday effect to simulated data.
  ## Args:
  ##   y: A zoo time series, with Dates for indices.
  ##   dates: The dates of the holidays.
  ##   effect: A vector of holiday effects of odd length.  The central effect is
  ##     the main holiday, with a symmetric influence window on either side.
  ## Returns:
  ##   y, with the holiday effects added.
  time <- dates - (length(effect) - 1) / 2
  for (i in 1:length(effect)) {
    y[time] <- y[time] + effect[i]
    time <- time + 1
  }
  return(y)
}

## Define some holidays.
memorial.day <- NamedHoliday("MemorialDay")
memorial.day.effect <- c(-.75, -2, -2)
memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25", "2016-05-25"))
y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect)

## The holidays can be in any order.
holiday.list <- list(memorial.day)

## Let's train the model to just before MemorialDay
cut_date = as.Date("2016-05-20")
train_data <- y[time(y) < cut_date]
test_data <- y[time(y) >= cut_date]
ss <- AddLocalLevel(list(), train_data)
ss <- AddRegressionHoliday(ss, train_data, holiday.list = holiday.list)
model <- bsts(train_data, state.specification = ss, niter = 500, ping = 0)
## Now let's make a prediction covering MemorialDay
my_horizon = 15
## Note adding the time stamps here doesn't help either
pred <- predict(object = model, horizon = my_horizon)
## Make a data frame for plotting
plot_info <- data.frame(Date = time(y), 
                        value = y, 
                        predict_mean = NA,
                        predict_upper = NA,
                        predict_lower = NA
                       )
plot_info[plot_info$Date %in% time(test_data)[1:my_horizon],]$predict_mean = pred$mean
plot_info[plot_info$Date %in% time(test_data)[1:my_horizon],]$predict_lower = pred$interval[1,]
plot_info[plot_info$Date %in% time(test_data)[1:my_horizon],]$predict_upper = pred$interval[2,]
## Let's make a pretty plot to demonstrate the problem
filter(plot_info, Date > time(test_data)[1] - 25 & Date < time(test_data)[my_horizon] + 10)  %>% 
    ggplot(aes(x = Date, y = value)) +
    geom_line() +
    geom_line(aes(y = predict_mean), col = "Forest Green") + # The prediction
    geom_line(aes(y = predict_lower), col = "Forest Green", lty = 2) + # lower bound
    geom_line(aes(y = predict_upper), col = "Forest Green", lty = 2)  # upper bound