paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

BSTS + Group-level intercept + LOO: non-conformable arrays #363

Closed rubenarslan closed 6 years ago

rubenarslan commented 6 years ago

There seems to be a problem using LOO when I combine BSTS + a group-level intercept. A reprex below:

library(brms)
#> Loading required package: Rcpp
#> Loading required package: ggplot2
#> Loading 'brms' package (version 2.1.2). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> Run theme_set(theme_default()) to use the default bayesplot theme.

dat = data.frame(y = rnorm(100), day = rep(1:10, length.out = 100), id = rep(1:10, 
  each = 1))

id = brm(y ~ 1 + (1 | id), data = dat, chains = 1, iter = 50)
#> Compiling the C++ model
#> Start sampling
#> 
#> SAMPLING FOR MODEL 'gaussian brms-model' NOW (CHAIN 1).
#> 
#> Gradient evaluation took 4.7e-05 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds.
#> Adjust your expectations accordingly!
#> 
#> 
#> WARNING: There aren't enough warmup iterations to fit the
#>          three stages of adaptation as currently configured.
#>          Reducing each adaptation stage to 15%/75%/10% of
#>          the given number of warmup iterations:
#>            init_buffer = 3
#>            adapt_window = 20
#>            term_buffer = 2
#> 
#> Iteration:  1 / 50 [  2%]  (Warmup)
#> Iteration:  5 / 50 [ 10%]  (Warmup)
#> Iteration: 10 / 50 [ 20%]  (Warmup)
#> Iteration: 15 / 50 [ 30%]  (Warmup)
#> Iteration: 20 / 50 [ 40%]  (Warmup)
#> Iteration: 25 / 50 [ 50%]  (Warmup)
#> Iteration: 26 / 50 [ 52%]  (Sampling)
#> Iteration: 30 / 50 [ 60%]  (Sampling)
#> Iteration: 35 / 50 [ 70%]  (Sampling)
#> Iteration: 40 / 50 [ 80%]  (Sampling)
#> Iteration: 45 / 50 [ 90%]  (Sampling)
#> Iteration: 50 / 50 [100%]  (Sampling)
#> 
#>  Elapsed Time: 0.010578 seconds (Warm-up)
#>                0.008524 seconds (Sampling)
#>                0.019102 seconds (Total)
bsts = brm(y ~ 1, autocor = cor_bsts(~day | id), data = dat, chains = 1, iter = 50)
#> Compiling the C++ model
#> Start sampling
#> 
#> SAMPLING FOR MODEL 'gaussian brms-model' NOW (CHAIN 1).
#> 
#> Gradient evaluation took 4.6e-05 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 0.46 seconds.
#> Adjust your expectations accordingly!
#> 
#> 
#> WARNING: There aren't enough warmup iterations to fit the
#>          three stages of adaptation as currently configured.
#>          Reducing each adaptation stage to 15%/75%/10% of
#>          the given number of warmup iterations:
#>            init_buffer = 3
#>            adapt_window = 20
#>            term_buffer = 2
#> 
#> Iteration:  1 / 50 [  2%]  (Warmup)
#> Iteration:  5 / 50 [ 10%]  (Warmup)
#> Iteration: 10 / 50 [ 20%]  (Warmup)
#> Iteration: 15 / 50 [ 30%]  (Warmup)
#> Iteration: 20 / 50 [ 40%]  (Warmup)
#> Iteration: 25 / 50 [ 50%]  (Warmup)
#> Iteration: 26 / 50 [ 52%]  (Sampling)
#> Iteration: 30 / 50 [ 60%]  (Sampling)
#> Iteration: 35 / 50 [ 70%]  (Sampling)
#> Iteration: 40 / 50 [ 80%]  (Sampling)
#> Iteration: 45 / 50 [ 90%]  (Sampling)
#> Iteration: 50 / 50 [100%]  (Sampling)
#> 
#>  Elapsed Time: 0.007239 seconds (Warm-up)
#>                0.01904 seconds (Sampling)
#>                0.026279 seconds (Total)
#> Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
#> http://mc-stan.org/misc/warnings.html#bfmi-low
#> Warning: Examine the pairs() plot to diagnose sampling problems
bsts_id = brm(y ~ 1 + (1 | id), autocor = cor_bsts(~day | id), data = dat, chains = 1, 
  iter = 50)
#> Compiling the C++ model
#> Start sampling
#> 
#> SAMPLING FOR MODEL 'gaussian brms-model' NOW (CHAIN 1).
#> 
#> Gradient evaluation took 5.4e-05 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 0.54 seconds.
#> Adjust your expectations accordingly!
#> 
#> 
#> WARNING: There aren't enough warmup iterations to fit the
#>          three stages of adaptation as currently configured.
#>          Reducing each adaptation stage to 15%/75%/10% of
#>          the given number of warmup iterations:
#>            init_buffer = 3
#>            adapt_window = 20
#>            term_buffer = 2
#> 
#> Iteration:  1 / 50 [  2%]  (Warmup)
#> Iteration:  5 / 50 [ 10%]  (Warmup)
#> Iteration: 10 / 50 [ 20%]  (Warmup)
#> Iteration: 15 / 50 [ 30%]  (Warmup)
#> Iteration: 20 / 50 [ 40%]  (Warmup)
#> Iteration: 25 / 50 [ 50%]  (Warmup)
#> Iteration: 26 / 50 [ 52%]  (Sampling)
#> Iteration: 30 / 50 [ 60%]  (Sampling)
#> Iteration: 35 / 50 [ 70%]  (Sampling)
#> Iteration: 40 / 50 [ 80%]  (Sampling)
#> Iteration: 45 / 50 [ 90%]  (Sampling)
#> Iteration: 50 / 50 [100%]  (Sampling)
#> 
#>  Elapsed Time: 0.017866 seconds (Warm-up)
#>                0.001214 seconds (Sampling)
#>                0.01908 seconds (Total)
#> Warning: There were 25 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
#> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

#> Warning: Examine the pairs() plot to diagnose sampling problems

LOO(id, bsts_id, pointwise = FALSE)
#> Warning: Found 31 observations with a pareto_k > 0.7 in model 'id'. With
#> this many problematic observations, it may be more appropriate to use
#> 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather
#> than LOO.
#> Error in eta + p(draws$ac$loclev, i, row = FALSE): non-conformable arrays
LOO(bsts, bsts_id, pointwise = FALSE)
#> Error in eta + p(draws$ac$loclev, i, row = FALSE): non-conformable arrays
paul-buerkner commented 6 years ago

Thanks I will try to fix it! Note that the cor_bsts structure will be deprecated as of brms 2.2 for reasons explained in #148

rubenarslan commented 6 years ago

Ah, didn't notice that issue. Nevermind then, it wasn't central to what I was trying to do.

paul-buerkner commented 6 years ago

There is a typo in your data in the sense that the same time point occurs multiple times per ID. Shouldn't it be:

dat = data.frame(y = rnorm(100), day = rep(1:10, length.out = 100), id = rep(1:10, 
  each = 10))

(I changed each = 1 to each = 10). Regardless, I will try to fix this issue. so that the post-processing still works.

paul-buerkner commented 6 years ago

I now force time points to be unique within groups. When this uniqueness holds, LOO and other post-processing functions should work as expected.