stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
385 stars 132 forks source link

`stan_glmer()` and `offset()` formula term #542

Open fweber144 opened 3 years ago

fweber144 commented 3 years ago

Summary:

A stan_glmer() fit with offset() in the formula seems to be handled incorrectly in posterior_linpred().

Description:

Unlike a stan_glm() fit, a stan_glmer() fit with offsets specified via offset() in the formula doesn't include the offsets in the linear predictors returned by posterior_linpred(). This might be related to issue #541.

Reproducible Steps:

library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
data("kidiq")
kidiq_gr <- within(kidiq, {
  agegr <- cut(mom_age,
               breaks = unique(quantile(mom_age)),
               include.lowest = TRUE)
  levels(agegr) <- paste0("lvl", seq_len(nlevels(agegr)))
})
set.seed(3492)
kidiq_gr$offs_col <- rnorm(nrow(kidiq))
glmm_fit <- stan_glmer(kid_score ~ mom_iq + (1 | agegr) + offset(offs_col),
                       data = kidiq_gr,
                       seed = 734572)
glmm_pl <- posterior_linpred(glmm_fit)

glmm_drws <- as.matrix(glmm_fit)
glmm_pl_man <- glmm_drws[, "(Intercept)"] +
  glmm_drws[, "mom_iq", drop = FALSE] %*%
  t(kidiq_gr[, "mom_iq", drop = FALSE]) +
  glmm_drws[, paste0("b[(Intercept) agegr:", kidiq_gr$agegr, "]"), drop = FALSE]
all.equal(unname(glmm_pl), unname(glmm_pl_man), tolerance = 1e-15)

with that last line returning TRUE, so the offsets are not included in the linear predictors. In contrast, for stan_glm(), the offsets are included in the linear predictors:

glm_fit <- stan_glm(kid_score ~ mom_iq + offset(offs_col),
                    data = kidiq_gr,
                    seed = 734572)
glm_pl <- posterior_linpred(glm_fit)

glm_drws <- as.matrix(glm_fit)
glm_pl_man <- glm_drws[, "(Intercept)"] +
  glm_drws[, "mom_iq", drop = FALSE] %*%
  t(kidiq_gr[, "mom_iq", drop = FALSE]) +
  matrix(kidiq_gr$offs_col,
         nrow = nrow(glm_drws),
         ncol = nrow(kidiq_gr),
         byrow = TRUE)
all.equal(unname(glm_pl), unname(glm_pl_man), tolerance = 1e-15)

(with that last line returning TRUE).

RStanARM Version:

2.21.2 (from https://mc-stan.org/r-packages/)

R Version:

4.1.0

Operating System:

Ubuntu 20.04.2

fweber144 commented 2 years ago

Seems like this is fixed in the new CRAN version 2.21.3:

### Part 1 from #542 (GLMM):
library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
data("kidiq")
kidiq_gr <- within(kidiq, {
  agegr <- cut(mom_age,
               breaks = unique(quantile(mom_age)),
               include.lowest = TRUE)
  levels(agegr) <- paste0("lvl", seq_len(nlevels(agegr)))
})
set.seed(3492)
kidiq_gr$offs_col <- rnorm(nrow(kidiq))
glmm_fit <- stan_glmer(kid_score ~ mom_iq + (1 | agegr) + offset(offs_col),
                       data = kidiq_gr,
                       seed = 734572)
glmm_pl <- posterior_linpred(glmm_fit)

glmm_drws <- as.matrix(glmm_fit)
glmm_pl_man <- glmm_drws[, "(Intercept)"] +
  glmm_drws[, "mom_iq", drop = FALSE] %*%
  t(kidiq_gr[, "mom_iq", drop = FALSE]) +
  glmm_drws[, paste0("b[(Intercept) agegr:", kidiq_gr$agegr, "]"), drop = FALSE]
all.equal(unname(glmm_pl), unname(glmm_pl_man), tolerance = 1e-15)
## --> Gives:
## With rstanarm v2.21.2: TRUE (incorrect)
## With rstanarm v2.21.3: "Mean relative difference: 0.009182638"
all.equal(unname(glmm_pl),
          unname(glmm_pl_man +
                   matrix(kidiq_gr$offs_col,
                          nrow = nrow(glmm_drws),
                          ncol = nrow(kidiq_gr),
                          byrow = TRUE)),
          tolerance = 1e-15)
## --> Gives:
## With rstanarm v2.21.2: "Mean relative difference: 0.009190268".
## With rstanarm v2.21.3: TRUE (correct)
###

### Part 2 from #542 (GLM):
glm_fit <- stan_glm(kid_score ~ mom_iq + offset(offs_col),
                    data = kidiq_gr,
                    seed = 734572)
glm_pl <- posterior_linpred(glm_fit)

glm_drws <- as.matrix(glm_fit)
glm_pl_man <- glm_drws[, "(Intercept)"] +
  glm_drws[, "mom_iq", drop = FALSE] %*%
  t(kidiq_gr[, "mom_iq", drop = FALSE]) +
  matrix(kidiq_gr$offs_col,
         nrow = nrow(glm_drws),
         ncol = nrow(kidiq_gr),
         byrow = TRUE)
all.equal(unname(glm_pl), unname(glm_pl_man), tolerance = 1e-15)
## --> Gives:
## TRUE (correct) with both, rstanarm v2.21.2 and v2.21.3
###

Was this fix intentional? I didn't see a NEWS entry for this (and this issue here wasn't closed).