Open fweber144 opened 3 years ago
The reprex above where the offset
argument was used doesn't seem to have changed from rstanarm 2.21.2 (from https://mc-stan.org/r-packages/, used above) to the new CRAN version 2.21.3, but the alternative way via an offset()
term in the formula seems to have changed between these two versions:
library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
data("kidiq")
kidiq_gr <- within(kidiq, {
agegr <- cut(mom_age,
breaks = unique(quantile(mom_age)),
include.lowest = TRUE)
levels(agegr) <- paste0("lvl", seq_len(nlevels(agegr)))
})
set.seed(3492)
kidiq_gr$offs_col <- rnorm(nrow(kidiq))
# GLMM:
glmm_fit <- stan_glmer(kid_score ~ mom_iq + (1 | agegr) + offset(offs_col),
data = kidiq_gr,
seed = 734572)
glmm_drws <- as.matrix(glmm_fit)
kidiq_gr_new <- head(kidiq_gr, 3)
glmm_pl_new <- posterior_linpred(glmm_fit, newdata = kidiq_gr_new)
## Throws:
# Warning message:
# In sweep(eta, 2L, offset, `+`) :
# STATS is longer than the extent of 'dim(x)[MARGIN]'
##
glmm_pl_new_man <- glmm_drws[, "(Intercept)"] +
glmm_drws[, "mom_iq", drop = FALSE] %*%
t(kidiq_gr_new[, "mom_iq", drop = FALSE]) +
glmm_drws[, paste0("b[(Intercept) agegr:", kidiq_gr_new$agegr, "]"), drop = FALSE]
all.equal(unname(glmm_pl_new), unname(glmm_pl_new_man), tolerance = 1e-15)
## --> With rstanarm v2.21.2: TRUE, but incorrect.
## --> With rstanarm v2.21.3: "Mean relative difference: 0.008601649".
all.equal(unname(glmm_pl_new),
unname(glmm_pl_new_man +
matrix(kidiq_gr$offs_col,
nrow = nrow(glmm_drws),
ncol = nrow(kidiq_gr_new),
byrow = TRUE)),
tolerance = 1e-15)
## --> With rstanarm v2.21.2: "Mean relative difference: 0.008608396".
## --> With rstanarm v2.21.3: TRUE, but incorrect.
# Desired:
all.equal(unname(glmm_pl_new),
unname(glmm_pl_new_man +
matrix(kidiq_gr_new$offs_col,
nrow = nrow(glmm_drws),
ncol = nrow(kidiq_gr_new),
byrow = TRUE)),
tolerance = 1e-15)
## --> With rstanarm v2.21.2: "Mean relative difference: 0.009548977".
## --> With rstanarm v2.21.3: "Mean relative difference: 0.01161201"
# GLM:
glm_fit <- stan_glm(kid_score ~ mom_iq + offset(offs_col),
data = kidiq_gr,
seed = 734572)
glm_drws <- as.matrix(glm_fit)
glm_pl_new <- posterior_linpred(glm_fit, newdata = kidiq_gr_new)
## Throws:
# Warning message:
# 'offset' argument is NULL but it looks like you estimated the model using an offset term.
##
glm_pl_new_man <- glm_drws[, "(Intercept)"] +
glm_drws[, "mom_iq", drop = FALSE] %*%
t(kidiq_gr_new[, "mom_iq", drop = FALSE])
all.equal(unname(glm_pl_new), unname(glm_pl_new_man), tolerance = 1e-15)
## --> With rstanarm v2.21.2 and v2.21.3: TRUE (excludes the offsets, but that is acceptable, because
## a warning is thrown).
# Desired:
all.equal(unname(glm_pl_new),
unname(glm_pl_new_man +
matrix(kidiq_gr_new$offs_col,
nrow = nrow(glm_drws),
ncol = nrow(kidiq_gr_new),
byrow = TRUE)),
tolerance = 1e-15)
## --> With rstanarm v2.21.2 and v2.21.3: "Mean relative difference: 0.009612115"
So now, with rstanarm v2.21.3, the issue described above for the offset
argument also occurs for the alternative formula way. (In v2.21.2, no offsets were added at all when using the formula way, which is basically issue #542, but with !is.null(newdata)
.)
Summary:
A
stan_glmer()
fit withoffset
seems to be handled incorrectly whenposterior_linpred()
(for example) is called withnewdata
but withoutoffset
.Description:
First, unlike a
stan_glm()
fit, astan_glmer()
fit with offsets specified via argumentoffset
doesn't produce an appropriate warning whenposterior_linpred()
(for example) is called withnewdata
but withoutoffset
. Secondly (and more importantly), in thatposterior_linpred()
call, thestan_glm()
and thestan_glmer()
fit differ in what is added to the linear predictor: Forstan_glm()
, a vector of zeros (and of appropriate length) is added, while forstan_glmer()
, the original offsets are recycled so that they match the number of observations innewdata
. The underlying issue might be that.pp_data_mer()
lacks a call to.pp_data_offset()
, unlike.pp_data()
and.pp_data_nlmer()
.Reproducible Steps:
The last line throws the warning
In contrast, with
stan_glm()
, an appropriate warning is thrown:namely
Calling
debug(rstanarm:::linear_predictor)
before thoseposterior_linpred()
calls reveals that for thestan_glm()
fit,offset
is a vector of 3 zeros, while for thestan_glmer()
fit,offset
is the originaloffs_vec
vector (of lengthnrow(kidiq) = 434
).RStanARM Version:
2.21.2 (from https://mc-stan.org/r-packages/)
R Version:
4.1.0
Operating System:
Windows 10