A stan_gamm4() fit with an offset() formula term seems to be handled incorrectly when posterior_linpred() (for example) is called with newdata but without offset. This is not a duplicate of #253 since #253 had multilevel terms included (whereas here, random = NULL).
Description:
This is similar to #541, but now for stan_gamm4 with random = NULL (instead of stan_glmer()) and for an offset() formula term (instead of the offset argument). I'll be repeating an adapted issue description from #541 for the sake of completeness:
First, unlike a stan_glm() fit, a stan_gamm4() fit with random = NULL and with offsets specified via an offset() formula term doesn't produce an appropriate warning when posterior_linpred() (for example) is called with newdata but without offset. Secondly (and more importantly), in that posterior_linpred() call, the stan_glm() and the stan_gamm4() fit differ in what is added to the linear predictor: For stan_glm(), a vector of zeros (and of appropriate length) is added, while for stan_gamm4(), the original offsets are recycled so that they match the number of observations in newdata.
Reproducible Steps:
# Original source: Example section of `?rstanarm::stan_gamm4`.
library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
dat <- mgcv::gamSim(1, n = 400, scale = 2) ## simulate 4 term additive truth
## Now add 20 level random effect`fac'...
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5
gam_fit <- stan_gamm4(y ~ s(x0) + x1 + s(x2) + offset(x3),
data = dat,
chains = 1,
iter = 500, # for example speed
seed = 1140350788)
dat_new <- head(dat, 3)
gam_pl <- posterior_linpred(gam_fit, newdata = dat_new)
The last line throws the warning
In sweep(eta, 2L, offset, `+`) :
STATS is longer than the extent of 'dim(x)[MARGIN]'
In contrast, with stan_glm(), an appropriate warning is thrown:
glm_fit <- stan_glm(y ~ x0 + x1 + x2 + offset(x3),
data = dat,
chains = 1,
iter = 500, # for example speed
seed = 1140350788)
glm_pl <- posterior_linpred(glm_fit, newdata = dat_new)
namely
'offset' argument is NULL but it looks like you estimated the model using an offset term.
Calling debug(rstanarm:::linear_predictor) before those posterior_linpred() calls reveals that for the stan_glm() fit, offset is a vector of 3 zeros, while for the stan_gamm4() fit, offset is the original offset vector (of length nrow(dat) = 400).
Summary:
A
stan_gamm4()
fit with anoffset()
formula term seems to be handled incorrectly whenposterior_linpred()
(for example) is called withnewdata
but withoutoffset
. This is not a duplicate of #253 since #253 had multilevel terms included (whereas here,random = NULL
).Description:
This is similar to #541, but now for
stan_gamm4
withrandom = NULL
(instead ofstan_glmer()
) and for anoffset()
formula term (instead of theoffset
argument). I'll be repeating an adapted issue description from #541 for the sake of completeness:First, unlike a
stan_glm()
fit, astan_gamm4()
fit withrandom = NULL
and with offsets specified via anoffset()
formula term doesn't produce an appropriate warning whenposterior_linpred()
(for example) is called withnewdata
but withoutoffset
. Secondly (and more importantly), in thatposterior_linpred()
call, thestan_glm()
and thestan_gamm4()
fit differ in what is added to the linear predictor: Forstan_glm()
, a vector of zeros (and of appropriate length) is added, while forstan_gamm4()
, the original offsets are recycled so that they match the number of observations innewdata
.Reproducible Steps:
The last line throws the warning
In contrast, with
stan_glm()
, an appropriate warning is thrown:namely
Calling
debug(rstanarm:::linear_predictor)
before thoseposterior_linpred()
calls reveals that for thestan_glm()
fit,offset
is a vector of 3 zeros, while for thestan_gamm4()
fit,offset
is the original offset vector (of lengthnrow(dat) = 400
).RStanARM Version:
2.21.2 (from https://mc-stan.org/r-packages/)
R Version:
4.1.1
Operating System:
Ubuntu 20.04.2