paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Time Series Forecasting #398

Closed joshualeond closed 6 years ago

joshualeond commented 6 years ago

I've been attempting to utilize brms's correlation structure to implement a time series model with bayesian estimation. I've noticed what looks like a problem with the predict() function when using it for future values. After 1-step ahead prediction the predictions seem to fall apart. I've provided a reprex below starting with the MLE fitted AR(1) model:

library(ggplot2)
library(forecast)
library(brms)

# data 
set.seed(250)
t_drift <- arima.sim(list(order = c(1,0,0), ar = 0.8), n = 50) + 0.50 * seq(1,50)
t_drift_df <- data.frame(y = as.matrix(t_drift))

#-------------------------------MLE version
mle_fit <- Arima(
  t_drift, 
  order = c(1, 0, 0), 
  include.mean = FALSE, 
  method = "ML"
  )
summary(mle_fit)
autoplot(forecast(mle_fit, h = 6))

Which results in the following predictions for 6 future values: mle_cast

Then implementing the same model in brms:

#-------------------------------Bayes version
bayes_fit <- brm(
  y ~ 0,
  autocor = cor_ar(~1, p = 1),
  data = t_drift_df
)

summary(bayes_fit)
newdata <- rbind(bayes_fit$data, data.frame(y = rep(NA, 6)))
pred <- as.data.frame(predict(bayes_fit, newdata, nsamples = 100))

names(pred) <- c("est", "se", "lower", "upper")
pred$y <- newdata$y
pred$x <- seq_len(nrow(newdata))
ggplot(pred, aes(x, est, ymin = lower, ymax = upper)) + 
  geom_smooth(stat = "identity") + 
  geom_point(aes(x, y), inherit.aes = FALSE)

With the following predictions: bayes_cast

Thanks for your work, Josh

paul-buerkner commented 6 years ago

Have you tried with the latest dev version of brms from github?

joshualeond commented 6 years ago

I just updated to the latest version and have the same results. The package version is 2.2.0.

paul-buerkner commented 6 years ago

Hmm ok, I can confirm this behavior. I am not sure if I would call it a bug though, because your brms model is misspecified for you data. First, you don't add an intercept and second (more importantly) you simulate the data with a linear trend, but fail to model that trend.

The reason arima still works is this case is because it is specifically designed for autocorrelation models and thus applies some special case coding for the forecasting to be reasonable (don't forget you have specifically called the forecast method).

brms, in comparison, is much more general an thus "dumper" for some particular cases. The predict function can do forecasting, but you have to do more manual work and need an appropriate model for it. Since predict doesn't "know" it does forecasting, it cannot automatically correct the starting point of the forecast to the latest observed value. It can only use what it knows, which is basically nothing in your case for the values to be forecasted (i.e. no intercept, no predictor values, etc.).

The solution is to fit a more appropriate model

t_drift_df$x <- seq(1, 50)

bayes_fit <- brm(
  y ~ 1 + x,
  autocor = cor_ar(~1, p = 1),
  data = t_drift_df
)

summary(bayes_fit)
newdata <- rbind(bayes_fit$data, data.frame(y = rep(NA, 6), x = 51:56))
pred <- as.data.frame(predict(bayes_fit, newdata, nsamples = 100))

names(pred) <- c("est", "se", "lower", "upper")
pred$y <- newdata$y
pred$x <- seq_len(nrow(newdata))
ggplot(pred, aes(x, est, ymin = lower, ymax = upper)) + 
  geom_smooth(stat = "identity") + 
  geom_point(aes(x, y), inherit.aes = FALSE)