stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
388 stars 133 forks source link

bug in kfold()?? #435

Closed jscamac closed 4 years ago

jscamac commented 4 years ago

Summary:

Bug in the kfold function where it is unable to find a random effect predictor.

Description:

I've been playing around with fitting models in rstanarm::stan_gamm4().The models fit fine but when I use loo I receive warning messages suggesting I use kfold with K=10 (see MWE below). However, when I use kfold it produces an error that indicates it cannot find the random effect variable .

The example below is just an example of the problem (ignore the very large number of high pareto_k.. I've chosen it just to highlight the issue.

Reproducible Steps:

library(rstanarm)
library(mgcv)

#Simulate a model
set.seed(200) 
dat <- gamSim(6, n=200, scale=2)
# Fit the stan model

#options(mc.cores=1) # Note this doesn't work in Rstudio 1.2.5042 with R 4.0.0
fit <- rstanarm::stan_gamm4(y ~ s(x0) + s(x1) + s(x2) + s(x3), data =  dat, random = ~(1|fac))

# Loo
loo_fit <- loo(fit)
#Warning message:
#Found 200 observations with a pareto_k > 0.7. With this many problematic observations we #recommend calling 'kfold' with argument 'K=10' to perform 10-fold cross-validation rather than LOO.

#Kfold
kfold(fit, K=10)
#Fitting model 1 out of 10
#Error in eval(predvars, data, env) : object 'fac' not found

RStanARM Version:

rstanarm (2.19.3) rstantools (2.0.0) loo (2.2.0)

R Version:

R 4.0.0

Operating System:

MacOS 10.14.6

jgabry commented 4 years ago

Thanks for reporting this. It's definitely a bug. I think what is going on is that the random argument is different from the formula argument (unlike for stan_glmer where the random effects are coded as part of the main formula) but this distinction is not accounted for when we do this

https://github.com/stan-dev/rstanarm/blob/9e4e46dc3538d281f588ffe25b9d43e786f57e8e/R/loo.R#L710

because the formula doesn't include fac. This ends up resulting in fac not being included in the dataset used to refit the models for kfold. I'm working on a fix now.