paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.26k stars 177 forks source link

unexpected behaviour calling `fitted` on non-gaussian model with cor_arma #656

Closed hansvancalster closed 5 years ago

hansvancalster commented 5 years ago

Thanks for implementing cor_arma for non-gaussian models in PR #650

I saw some differences when trying to reproduce the output of marginal_effects(..., method = "fitted") starting from the fitted procedure. The code that illustrates the differences follows:

library(tidyverse)
library(brms)

#simulate data
n <- 100
rho <- 0.8
sd <- 0.5
eta <- as.vector(arima.sim(list(order = c(1,0,0), ar = rho),
                          n = n,
                          sd = sd)
                )
E <- 1:n / 10
y <- rpois(n, E * exp(eta))
data <- data.frame(y = y, time = 1:n)

#fit model
fit_y <- brm(
  y ~ time,
  data = data,
  family = poisson(),
  autocor = cor_arma(~ time, p = 1, cov = TRUE)
)

summary(fit_y)
pp_check(fit_y) #OK

# This produces a plot of the linear time effect only
plot(marginal_effects(fit_y), points = TRUE)

# The following produces a plot of the linear time effect + ar1 process?
fitted_df0 <- fitted(fit_y) %>%
  as.data.frame() %>%
  bind_cols(data)

fitted_df0 %>%
  ggplot(aes(x = time, y = y)) +
  geom_point() +
  geom_ribbon(aes(x = time,
                  ymin = Q2.5,
                  ymax =  Q97.5), alpha = 0.2) +
  geom_line(aes(x = time, y = Estimate))

# Expecting that the following produces the same plot as the previous one
# only difference is that I pass newdata, but the new data are the same as the original data
# a different plot is returned (looks more like a prediction interval?)
fitted_df <- fitted(fit_y,
                    newdata = data.frame(time = 1:n)) %>%
  as.data.frame() %>%
  bind_cols(data)

fitted_df %>%
  ggplot(aes(x = time, y = y)) +
  geom_point() +
  geom_ribbon(aes(x = time,
                  ymin = Q2.5,
                  ymax =  Q97.5), alpha = 0.2) +
  geom_line(aes(x = time, y = Estimate))

Is this intended behaviour or a bug?

paul-buerkner commented 5 years ago

marginal_effects excludes autocorrelation parameters in the predictions. What happens if you compare it to fitted(..., incl_autocor = FALSE)?

hansvancalster commented 5 years ago

Aah yes, I can conform that setting incl_autocor = FALSE reproduces the default marginal_effects plot. Great! I didn't see the argument in the help of marginal_effects but I should have looked further for the arguments of extract_draws which can be passed to the former function.

I also understand now the difference between fitted(..., incl_autocor = TRUE) and fitted(..., incl_autocor = TRUE, newdata = data.frame(time = 1:n) in the above example. With incl_autocor = FALSE this will result in the same plot, but with incl_autocor = TRUE and newdata the expectations of the posterior predictions are no longer conditioned on the observed responses, which results in wider interval.