fweber144 commented 1 year ago

This starts to address #207 by adding the submodel fitter MASS::glmmPQL() (see lme4/lme4#682 for the background on this) as a "hidden" feature. This is still a hidden feature because there are some TODO (glmmPQL) comments left. This PR should be merged nonetheless because it can already provide some advantage (see the demonstrations below) and because it doesn't affect existing code (even the new dependencies MASS and nlme are not really extra dependencies because they are already dependencies of lme4, for example).

Note that in general, PQL gives a solution that differs from the maximum likelihood solution, so PQL does not really solve the projection problem (although it can probably be considered as an approximation; but this is just a guess from my side). So it would probably be better to conduct some theoretical investigations first before un-hiding this feature (and the TODO (glmmPQL) comments need to be solved first as well).

Binomial (Bernoulli) demo

With respect to the binomial part of #207, here is a short demo of the advantage of this PR.

Data

# Number of observations in the training dataset (= number of observations in
# the test dataset):
N <- 100
sim_brnll <- function(nobs = 2 * N, ncon = 10, ngrpPL = 4, ngrpGL = 10,
                      nnoise = 39) {
  # Regression coefficients for continuous predictors:
  coefs_con <- rnorm(ncon)
  # Continuous predictors:
  X_con <- matrix(rnorm(nobs * ncon), ncol = ncon)
  colnames(X_con) <- paste0("x", seq_len(ncon))
  # Start linear predictor:
  linpred <- X_con %*% coefs_con

  # Population-level (PL) categorical predictor:
  X_con <- data.frame(X_con,
                      grpPL = gl(n = ngrpPL, k = nobs %/% ngrpPL, length = nobs,
                                 labels = paste0("grpPL", seq_len(ngrpPL))))
  # Regression coefficients for the PL categorical predictor:
  coefs_catPL <- rnorm(ngrpPL)
  # Continue linear predictor:
  linpred <- linpred + coefs_catPL[X_con$grpPL]

  # Group-level (GL) categorical predictor:
  X_con <- data.frame(X_con,
                      grpGL = gl(n = ngrpGL, k = nobs %/% ngrpGL, length = nobs,
                                 labels = paste0("grpGL", seq_len(ngrpGL))))
  # Shuffle the GL categorical predictor to avoid a systematic structure in
  # combination with the PL categorical predictor:
  X_con$grpGL <- sample(X_con$grpGL)
  # Regression coefficients for the GL categorical predictor:
  coefs_catGL <- rnorm(ngrpGL)
  # Continue linear predictor:
  linpred <- linpred + coefs_catGL[X_con$grpGL]

  # Noise predictors:
  X_con <- data.frame(X_con,
                      xn = matrix(rnorm(nobs * nnoise), ncol = nnoise))

  # Bernoulli response, using the logit link:
  X_con$y <- rbinom(nobs, size = 1, prob = plogis(linpred))
  # Shuffle order of observations:
  X_con <- X_con[sample.int(nobs), , drop = FALSE]
  # Drop the shuffled original row names:
  rownames(X_con) <- NULL
  return(X_con)
}
set.seed(300416)
dat_brnll <- sim_brnll()
dat_brnll_train <- dat_brnll[1:N, , drop = FALSE]
dat_brnll_test <- dat_brnll[(N + 1):nrow(dat_brnll), , drop = FALSE]

Reference model

library(rstanarm)
# Number of regression coefficients:
( D <- sum(grepl("^x|^grpPL", names(dat_brnll_train))) )
# Prior guess for the number of relevant (i.e., non-zero) regression
# coefficients:
p0 <- 11
# Hyperprior scale for tau, the global shrinkage parameter:
tau0 <- p0 / (D - p0) * 2 / sqrt(N)
options(mc.cores = parallel::detectCores(logical = FALSE))
refm_fml <- as.formula(paste("y", "~", paste(
  c("(1 | grpGL)", grep("^x|^grpPL", names(dat_brnll_train), value = TRUE)),
  collapse = " + "
)))
refm_fit_brnll <- stan_glmer(
  formula = refm_fml,
  family = binomial(),
  data = dat_brnll_train,
  prior = hs(global_scale = tau0, slab_df = 100, slab_scale = 1),
  seed = 286013, QR = TRUE, refresh = 0
)

Default GLMM submodel fitter `lme4::glmer()`

First, we'll run varsel() (with argument d_test specified) using the default GLMM submodel fitter lme4::glmer():

library(projpred)
system.time(cvvs_trad <- varsel(
  refm_fit_brnll,
  d_test = list(
    data = dat_brnll_test,
    offset = rep(0, nrow(dat_brnll_test)),
    weights = rep(1, nrow(dat_brnll_test)),
    y = dat_brnll_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  seed = 95930
))
##    user  system elapsed 
## 235.007   0.108 235.016
smmry_trad <- summary(
  cvvs_trad,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_trad, digits = 3)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula: y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 100
## Observations (test set): 100
## Search method: forward, maximum number of terms 19
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## Suggested Projection Size: 4
## 
## Selection Summary:
##  size solution_terms   mlpd    se  lower  upper   diff diff.se
##     0           <NA> -0.710 0.010 -0.720 -0.700 -0.228   0.043
##     1             x6 -0.675 0.051 -0.726 -0.624 -0.193   0.048
##     2             x7 -0.638 0.060 -0.698 -0.579 -0.156   0.049
##     3             x3 -0.579 0.053 -0.631 -0.526 -0.097   0.041
##     4             x9 -0.514 0.050 -0.564 -0.464 -0.032   0.034
##     5             x2 -0.473 0.049 -0.521 -0.425  0.009   0.028
##     6             x4 -0.456 0.047 -0.503 -0.409  0.026   0.023
##     7           xn.7 -0.459 0.047 -0.507 -0.412  0.023   0.023
##     8          xn.27 -0.463 0.050 -0.513 -0.413  0.019   0.022
##     9          xn.20 -0.466 0.051 -0.517 -0.416  0.016   0.020
##    10           xn.1 -0.476 0.052 -0.528 -0.424  0.006   0.018
##    11          xn.38 -0.480 0.053 -0.533 -0.427  0.002   0.017
##    12          xn.18 -0.480 0.053 -0.532 -0.427  0.002   0.016
##    13             x1 -0.485 0.054 -0.538 -0.431 -0.003   0.016
##    14          xn.16 -0.488 0.054 -0.542 -0.434 -0.006   0.016
##    15          xn.17 -0.486 0.054 -0.539 -0.432 -0.004   0.016
##    16             x5 -0.486 0.054 -0.540 -0.432 -0.004   0.016
##    17          grpPL -0.487 0.054 -0.541 -0.433 -0.005   0.016
##    18           xn.6 -0.488 0.055 -0.543 -0.434 -0.006   0.015
##    19          xn.32 -0.489 0.055 -0.543 -0.435 -0.007   0.015

As can be seen from the summary() output, the group-level term is not among the first 19 terms of the solution path (but several noise terms are).

New GLMM submodel fitter `MASS::glmmPQL()`

Now, we activate option projpred.PQL to use MASS::glmmPQL() as submodel fitter for GLMMs:

options(projpred.PQL = TRUE)
system.time(cvvs_PQLtrad <- varsel(
  refm_fit_brnll,
  d_test = list(
    data = dat_brnll_test,
    offset = rep(0, nrow(dat_brnll_test)),
    weights = rep(1, nrow(dat_brnll_test)),
    y = dat_brnll_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  seed = 95930
))
##    user  system elapsed 
## 467.261   0.544 467.595
smmry_PQLtrad <- summary(
  cvvs_PQLtrad,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_PQLtrad, digits = 3)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula: y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 100
## Observations (test set): 100
## Search method: forward, maximum number of terms 19
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## Suggested Projection Size: 4
## 
## Selection Summary:
##  size solution_terms   mlpd    se  lower  upper   diff diff.se
##     0           <NA> -0.710 0.010 -0.720 -0.700 -0.228   0.043
##     1             x6 -0.675 0.051 -0.726 -0.624 -0.193   0.048
##     2             x7 -0.638 0.060 -0.698 -0.579 -0.156   0.049
##     3             x3 -0.579 0.053 -0.631 -0.526 -0.097   0.041
##     4             x9 -0.514 0.050 -0.564 -0.464 -0.032   0.034
##     5             x2 -0.473 0.049 -0.521 -0.425  0.009   0.028
##     6             x4 -0.456 0.047 -0.503 -0.409  0.026   0.023
##     7    (1 | grpGL) -0.445 0.046 -0.491 -0.399  0.037   0.025
##     8          xn.20 -0.444 0.047 -0.490 -0.398  0.038   0.022
##     9           xn.1 -0.453 0.047 -0.500 -0.405  0.029   0.020
##    10          xn.27 -0.458 0.050 -0.507 -0.408  0.024   0.020
##    11           xn.7 -0.465 0.051 -0.516 -0.414  0.017   0.019
##    12          xn.38 -0.471 0.052 -0.523 -0.419  0.011   0.017
##    13          xn.31 -0.467 0.051 -0.518 -0.416  0.015   0.017
##    14          xn.17 -0.467 0.051 -0.518 -0.415  0.015   0.017
##    15             x5 -0.466 0.052 -0.518 -0.415  0.016   0.016
##    16          xn.16 -0.470 0.052 -0.521 -0.418  0.012   0.016
##    17          xn.14 -0.468 0.052 -0.519 -0.417  0.014   0.016
##    18          xn.18 -0.467 0.051 -0.518 -0.416  0.015   0.015
##    19          grpPL -0.467 0.051 -0.518 -0.416  0.015   0.014

# Check that MASS::glmmPQL() was actually used (only checking the first model
# size where this needs to be the case; later model sizes (for which this also
# needs to hold) are not checked here):
stopifnot(all(
  sapply(cvvs_PQLtrad$search_path$submodls[[1 + 7]], inherits, "glmmPQL")
))

Now, the group-level term is at position (size) 7 of the solution path (and it is correctly positioned before all noise terms). The fact that the varsel() run now takes about twice as long is due to the fact that the group-level term is selected so early, meaning that at all later submodel sizes, all submodels are GLMMs (which take longer to fit than GLMs). So the runtimes cannot really be compared. (I included them nonetheless to explain this.)

Poisson demo

Now a short demo of the advantage of this PR with respect to the Poisson part of #207.

Data

# Number of observations in the training dataset (= number of observations in
# the test dataset):
N <- 71
sim_poiss <- function(nobs = 2 * N, ncon = 10, ngrpPL = 4, ngrpGL = 15,
                      nnoise = 39) {
  # Regression coefficients for continuous predictors:
  coefs_con <- rnorm(ncon)
  # Continuous predictors:
  X_con <- matrix(rnorm(nobs * ncon), ncol = ncon)
  colnames(X_con) <- paste0("x", seq_len(ncon))
  # Start linear predictor:
  linpred <- X_con %*% coefs_con

  # Population-level (PL) categorical predictor:
  X_con <- data.frame(X_con,
                      grpPL = gl(n = ngrpPL, k = nobs %/% ngrpPL, length = nobs,
                                 labels = paste0("grpPL", seq_len(ngrpPL))))
  # Regression coefficients for the PL categorical predictor:
  coefs_catPL <- rnorm(ngrpPL)
  # Continue linear predictor:
  linpred <- linpred + coefs_catPL[X_con$grpPL]

  # Group-level (GL) categorical predictor:
  X_con <- data.frame(X_con,
                      grpGL = gl(n = ngrpGL, k = nobs %/% ngrpGL, length = nobs,
                                 labels = paste0("grpGL", seq_len(ngrpGL))))
  # Shuffle the GL categorical predictor to avoid a systematic structure in
  # combination with the PL categorical predictor:
  X_con$grpGL <- sample(X_con$grpGL)
  # Regression coefficients for the GL categorical predictor:
  coefs_catGL <- rnorm(ngrpGL, sd = 2)
  # Continue linear predictor:
  linpred <- linpred + coefs_catGL[X_con$grpGL]

  # Noise predictors:
  X_con <- data.frame(X_con,
                      xn = matrix(rnorm(nobs * nnoise), ncol = nnoise))

  # Poisson response, using the log link (i.e., exp() as inverse link):
  X_con$y <- rpois(nobs, lambda = exp(linpred))
  # Shuffle order of observations:
  X_con <- X_con[sample.int(nobs), , drop = FALSE]
  # Drop the shuffled original row names:
  rownames(X_con) <- NULL
  return(X_con)
}
set.seed(783490)
dat_poiss <- sim_poiss()
dat_poiss_train <- dat_poiss[1:N, , drop = FALSE]
dat_poiss_test <- dat_poiss[(N + 1):nrow(dat_poiss), , drop = FALSE]

Reference model

library(rstanarm)
# Number of regression coefficients:
( D <- sum(grepl("^x|^grpPL", names(dat_poiss_train))) )
# Prior guess for the number of relevant (i.e., non-zero) regression
# coefficients:
p0 <- 11
# Prior guess for the overall magnitude of the response values, see Table 1 of
# Piironen and Vehtari (2017, DOI: 10.1214/17-EJS1337SI):
mu_prior <- 10
# Hyperprior scale for tau, the global shrinkage parameter:
tau0 <- p0 / (D - p0) / sqrt(mu_prior) / sqrt(N)
options(mc.cores = parallel::detectCores(logical = FALSE))
refm_fml <- as.formula(paste("y", "~", paste(
  c("(1 | grpGL)", grep("^x|^grpPL", names(dat_poiss_train), value = TRUE)),
  collapse = " + "
)))
refm_fit_poiss <- stan_glmer(
  formula = refm_fml,
  family = poisson(),
  data = dat_poiss_train,
  prior = hs(global_scale = tau0, slab_df = 100, slab_scale = 1),
  seed = 286013, QR = TRUE, refresh = 0
)

Default GLMM submodel fitter `lme4::glmer()`

Again, we'll first run varsel() (with argument d_test specified) using the default GLMM submodel fitter lme4::glmer() (we're now restricting nterms_max to a value of 5 because this is sufficient for our purposes here and because otherwise, the runtime would be even higher):

library(projpred)
system.time(cvvs_trad <- varsel(
  refm_fit_poiss,
  d_test = list(
    data = dat_poiss_test,
    offset = rep(0, nrow(dat_poiss_test)),
    weights = rep(1, nrow(dat_poiss_test)),
    y = dat_poiss_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  nterms_max = 5,
  seed = 95930
))
##     user   system  elapsed 
## 2918.676    2.067 2918.287
smmry_trad <- summary(
  cvvs_trad,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_trad, digits = 3)
## 
## Family: poisson 
## Link function: log 
## 
## Formula: y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 71
## Observations (test set): 71
## Search method: forward, maximum number of terms 5
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## Suggested Projection Size: NA
## 
## Selection Summary:
##  size solution_terms    mlpd     se   lower   upper    diff diff.se
##     0           <NA> -53.964 19.846 -73.700 -34.229 -50.986  19.705
##     1    (1 | grpGL) -45.140 15.108 -60.165 -30.116 -42.162  14.964
##     2             x5 -34.187 10.799 -44.927 -23.448 -31.209  10.648
##     3             x4 -16.933  5.191 -22.096 -11.771 -13.955   5.054
##     4             x6 -12.710  4.369 -17.054  -8.366  -9.731   4.242
##     5             x2  -8.762  2.918 -11.664  -5.860  -5.783   2.801

# Check that lme4::glmer() was actually used (only checking the first model
# size where this needs to be the case; later model sizes (for which this also
# needs to hold) are not checked here):
stopifnot(all(
  sapply(cvvs_trad$search_path$submodls[[1 + 1]], inherits, "glmerMod")
))

As can be seen from the summary() output, the group-level term is selected first (in particular, before all noise terms). So in case of the Poisson family, the validity of the results when using lme4::glmer() as GLMM submodel fitter does not seem to be compromised (only the runtime, as we will see below).

New GLMM submodel fitter `MASS::glmmPQL()`

Now, we again activate option projpred.PQL to use MASS::glmmPQL() as submodel fitter for GLMMs:

options(projpred.PQL = TRUE)
system.time(cvvs_PQLtrad <- varsel(
  refm_fit_poiss,
  d_test = list(
    data = dat_poiss_test,
    offset = rep(0, nrow(dat_poiss_test)),
    weights = rep(1, nrow(dat_poiss_test)),
    y = dat_poiss_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  nterms_max = 5,
  seed = 95930
))
##    user  system elapsed 
## 309.080   0.276 309.214
smmry_PQLtrad <- summary(
  cvvs_PQLtrad,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_PQLtrad, digits = 3)
## 
## Family: poisson 
## Link function: log 
## 
## Formula: y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 71
## Observations (test set): 71
## Search method: forward, maximum number of terms 5
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## Suggested Projection Size: NA
## 
## Selection Summary:
##  size solution_terms    mlpd     se   lower   upper    diff diff.se
##     0           <NA> -53.964 19.846 -73.700 -34.229 -50.986  19.705
##     1    (1 | grpGL) -46.924 16.260 -63.094 -30.755 -43.946  16.117
##     2             x5 -35.492 11.477 -46.906 -24.079 -32.513  11.328
##     3             x4 -17.336  5.357 -22.662 -12.009 -14.357   5.220
##     4             x6 -12.218  4.196 -16.391  -8.045  -9.239   4.071
##     5             x2  -8.045  2.499 -10.530  -5.559  -5.066   2.376

# Check that MASS::glmmPQL() was actually used (only checking the first model
# size where this needs to be the case; later model sizes (for which this also
# needs to hold) are not checked here):
stopifnot(all(
  sapply(cvvs_PQLtrad$search_path$submodls[[1 + 1]], inherits, "glmmPQL")
))

The results are basically the same as above where lme4::glmer() was used as GLMM submodel fitter. In particular, the group-level term is at position (size) 1 of the solution path (and it is correctly positioned before all noise terms). However, the runtime is only about one tenth of that above where lme4::glmer() was used as GLMM submodel fitter.

fweber144 commented 1 year ago

Oh, the fact that multiple objects are named cvvs_[...] does not have any meaning. In a previous version of the code, I was using cv_varsel() and forgot to update the names after switching to varsel() with d_test.

fweber144 commented 1 year ago

Until the new submodel fitter MASS::glmmPQL() is implemented completely, the latent projection merged in PR #372 provides an alternative (approximate) solution to the two problems illustrated above (invalid results for binomial GLMMs in case of the traditional projection; high runtime for Poisson GLMMs in case of the traditional projection):

Binomial (Bernoulli) demo

Run the following code after having run the code from the Bernoulli sections "Data" and "Reference model" from above:

library(projpred)
system.time(cvvs_lat <- varsel(
  refm_fit_brnll,
  d_test = list(
    data = dat_brnll_test,
    offset = rep(0, nrow(dat_brnll_test)),
    weights = rep(1, nrow(dat_brnll_test)),
    ### Here, we are not interested in latent-scale post-processing, so we can
    ### set element `y` to a vector of `NA`s:
    y = rep(NA, nrow(dat_brnll_test)),
    ###
    y_oscale = dat_brnll_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  seed = 95930,
  latent = TRUE
))
##    user  system elapsed 
## 237.816   0.367 238.043
smmry_lat <- summary(
  cvvs_lat,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
## 
## Family: binomial 
## Link function: logit 
## 
## ------
## Latent-scale family:
## 
## Family: gaussian 
## Link function: identity 
## 
## ------
## Formula: .y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 100
## Observations (test set): 100
## Projection method: latent
## Search method: forward, maximum number of terms 19
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## 
## Selection Summary (response scale):
##  size solution_terms   mlpd    se  lower  upper   diff diff.se
##     0           <NA> -0.730 0.018 -0.748 -0.712 -0.248   0.044
##     1             x6 -0.763 0.078 -0.841 -0.685 -0.281   0.067
##     2             x7 -0.708 0.084 -0.792 -0.624 -0.226   0.068
##     3             x3 -0.624 0.072 -0.696 -0.552 -0.142   0.054
##     4             x2 -0.572 0.069 -0.640 -0.503 -0.090   0.049
##     5             x9 -0.487 0.061 -0.548 -0.427 -0.005   0.036
##     6             x4 -0.465 0.059 -0.524 -0.405  0.017   0.031
##     7           xn.1 -0.473 0.060 -0.533 -0.413  0.009   0.030
##     8          xn.20 -0.474 0.060 -0.534 -0.414  0.008   0.027
##     9    (1 | grpGL) -0.465 0.059 -0.524 -0.406  0.017   0.027
##    10          xn.27 -0.466 0.061 -0.528 -0.405  0.016   0.027
##    11           xn.7 -0.473 0.062 -0.536 -0.411  0.009   0.027
##    12          xn.38 -0.482 0.063 -0.545 -0.418  0.000   0.026
##    13          xn.18 -0.483 0.064 -0.547 -0.419 -0.001   0.026
##    14             x1 -0.489 0.065 -0.553 -0.424 -0.007   0.026
##    15          xn.16 -0.490 0.065 -0.555 -0.425 -0.008   0.026
##    16          xn.17 -0.489 0.065 -0.554 -0.425 -0.007   0.026
##    17           xn.5 -0.488 0.064 -0.552 -0.424 -0.006   0.025
##    18          xn.12 -0.487 0.064 -0.551 -0.423 -0.006   0.025
##    19             x5 -0.488 0.064 -0.552 -0.423 -0.006   0.025

We see that the latent-projection solution path is similar to the traditional-projection one if the latter uses the new GLMM submodel fitter MASS::glmmPQL(). Most importantly, the group-level term occurs comparably early (size 9), but after two noise terms (which it didn't in case of MASS::glmmPQL()). In this sense, the latent projection gives still suboptimal results, but already better ones than the traditional projection with lme4::glmer() as submodel fitter.

With respect to the performance evaluation results (on response scale, not latent scale), the latent projection gives a different (slightly more unstable) picture than the two traditional-projection approaches:

### Requires to also run the projpred code from
### <https://github.com/stan-dev/projpred/pull/353#issue-1390738883>:
plot(cvvs_trad, stats = "mlpd", deltas = TRUE)
plot(cvvs_PQLtrad, stats = "mlpd", deltas = TRUE)
### 
plot(cvvs_lat, stats = "mlpd", deltas = TRUE)

brnll_trad brnll_PQLtrad brnll_lat The slightly more unstable latent-projection performance evaluation results are still usable, however.

In conclusion (for this Bernoulli demo), until the implementation of the new traditional-projection GLMM submodel fitter MASS::glmmPQL() is finished, the latent projection should be considered as an alternative to the currently implemented submodel fitter lme4::glmer().

Poisson demo

Run the following code after having run the code from the Poisson sections "Data" and "Reference model" from above:

library(projpred)
system.time(cvvs_lat <- varsel(
  refm_fit_poiss,
  d_test = list(
    data = dat_poiss_test,
    offset = rep(0, nrow(dat_poiss_test)),
    weights = rep(1, nrow(dat_poiss_test)),
    ### Here, we are not interested in latent-scale post-processing, so we can
    ### set element `y` to a vector of `NA`s:
    y = rep(NA, nrow(dat_poiss_test)),
    ###
    y_oscale = dat_poiss_test$y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  nterms_max = 5,
  seed = 95930,
  latent = TRUE
))
##    user  system elapsed 
## 104.535   0.167 104.622
smmry_lat <- summary(
  cvvs_lat,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
## 
## Family: poisson 
## Link function: log 
## 
## ------
## Latent-scale family:
## 
## Family: gaussian 
## Link function: identity 
## 
## ------
## Formula: .y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 + 
##     xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 + 
##     xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 + 
##     xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 + 
##     xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 71
## Observations (test set): 71
## Projection method: latent
## Search method: forward, maximum number of terms 5
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## 
## Selection Summary (response scale):
##  size solution_terms    mlpd     se    lower   upper    diff diff.se
##     0           <NA> -81.176 29.063 -110.240 -52.113 -78.198  28.914
##     1    (1 | grpGL) -52.793 18.015  -70.807 -34.778 -49.814  17.864
##     2             x5 -33.465 10.681  -44.146 -22.784 -30.486  10.529
##     3             x4 -11.435  3.878  -15.313  -7.557  -8.456   3.753
##     4             x6  -8.581  3.116  -11.697  -5.464  -5.602   3.006
##     5             x2  -6.914  2.394   -9.308  -4.519  -3.935   2.286

The solution path is the same as in the two traditional-projection variants (in particular, the group-level term is always selected first, so its relevance should never be underestimated), but the runtime is much lower than for the traditional projection with lme4::glmer().

Again, the performance evaluation results deviate from the two traditional-projection variants:

### Requires to also run the projpred code from
### <https://github.com/stan-dev/projpred/pull/353#issue-1390738883>:
plot(cvvs_trad, stats = "mlpd", deltas = TRUE)
plot(cvvs_PQLtrad, stats = "mlpd", deltas = TRUE)
### 
plot(cvvs_lat, stats = "mlpd", deltas = TRUE)

poiss_trad poiss_PQLtrad poiss_lat But still, the latent-projection performance evaluation results are usable.

In conclusion (for this Poisson demo), the latent projection should be considered as a faster alternative to the currently implemented traditional-projection GLMM submodel fitter lme4::glmer() which seems to give valid results, but takes an eternity.

Session info

This amendment was run with projpred 2.4.0, in contrast to above where the code was run at the state of the merged PR #353. This should affect results only very slightly, though. Furthermore, I ran this amendment code on a different machine where the runtime is approx. 90% of the runtime on the machine that I used above.

fweber144 commented 1 year ago

In case of a multilevel model, the augmented-data projection seems to suffer from the same issue as the traditional projection for the binomial family, which makes sense because of the possible explanation given in https://github.com/lme4/lme4/issues/682#issue-1242842076 and https://github.com/lme4/lme4/issues/682#issuecomment-1135533543 (a fractional response inherently comes with more similar groups). Again, the latent projection seems to be a remedy. Illustration:

Data

# Number of observations in the training dataset (= number of observations in
# the test dataset):
N <- 100L
sim_cumul <- function(nobsv_sim = 2 * N, ncat = 3L, npreds_cont = 5L,
                      ngrPL = 3L, nobsv_per_grGL = 2L) {
  yunq <- paste0("ycat", seq_len(ncat))
  nthres <- ncat - 1L
  thres <- seq(-1, 1, length.out = nthres)

  # Continuous predictors:
  dat_sim <- setNames(replicate(npreds_cont, {
    rnorm(nobsv_sim, sd = 0.5)
  }, simplify = FALSE), paste0("Xcont", seq_len(npreds_cont)))
  dat_sim <- as.data.frame(dat_sim)
  coefs_cont <- seq(-4, 4, length.out = npreds_cont)
  eta <- drop(as.matrix(dat_sim) %*% coefs_cont)

  # Population-level (PL) categorical predictor:
  dat_sim$XgrPL <- gl(n = ngrPL, k = nobsv_sim %/% ngrPL, length = nobsv_sim,
                      labels = paste0("gr", seq_len(ngrPL)))
  # Shuffle the PL categorical predictor to avoid missing levels in the training
  # or the test dataset:
  dat_sim$XgrPL <- sample(dat_sim$XgrPL)
  coefs_grPL <- c(0, seq(-2, 2, length.out = ngrPL - 1L))
  eta <- eta + coefs_grPL[as.numeric(dat_sim$XgrPL)]

  # Group-level (GL) intercepts:
  ngrGL <- c(nobsv_sim %/% nobsv_per_grGL)
  dat_sim$XgrGL <- gl(n = ngrGL, k = nobsv_sim %/% ngrGL, length = nobsv_sim,
                      labels = paste0("gr", seq_len(ngrGL)))
  # Shuffle the GL categorical predictor to avoid a systematic structure in
  # combination with the PL categorical predictor:
  dat_sim$XgrGL <- sample(dat_sim$XgrGL)
  coefs_grGL <- rnorm(ngrGL, sd = 1.5)
  eta <- eta + coefs_grGL[dat_sim$XgrGL]

  # Calculate the probabilities for the response categories:
  thres_eta <- sapply(thres, function(thres_k) {
    thres_k - eta
  })
  # Shouldn't be necessary (just to be safe): Emulate a single posterior draw:
  dim(thres_eta) <- c(1L, nobsv_sim, nthres)
  yprobs <- brms:::inv_link_cumulative(thres_eta, link = "logit")
  # Because of emulating a single posterior draw above:
  dim(yprobs) <- c(nobsv_sim, ncat)

  # Response:
  dat_sim$Y <- factor(sapply(seq_len(nobsv_sim), function(i) {
    sample(yunq, size = 1, prob = yprobs[i, ])
  }), ordered = TRUE)

  # Formula:
  voutc <- "Y"
  vpreds <- grep("^Xcont|^XgrPL", names(dat_sim), value = TRUE)
  vpreds_GL <- grep("^XgrGL", names(dat_sim), value = TRUE)
  if (length(vpreds_GL) > 0L) {
    stopifnot(all(lengths(coefs_grGL) == 1))
    vpreds_GL <- paste0("(1 | ", vpreds_GL, ")")
  }
  fml_sim <- as.formula(paste(
    voutc, "~", paste(c(vpreds, vpreds_GL), collapse = " + ")
  ))

  # Check response categories:
  dat_train <- dat_sim[1:(nobsv_sim %/% 2), , drop = FALSE]
  dat_test <- dat_sim[(nobsv_sim %/% 2 + 1):nobsv_sim, , drop = FALSE]
  # Check that all response categories are present in training and test dataset:
  stopifnot(identical(levels(droplevels(dat_train$Y)), yunq),
            identical(levels(droplevels(dat_test$Y)), yunq))
  # Check that all levels of the PL categorical predictor are present in
  # training and test dataset:
  stopifnot(identical(levels(droplevels(dat_train$XgrPL)),
                      paste0("gr", seq_len(ngrPL))),
            identical(levels(droplevels(dat_test$XgrPL)),
                      paste0("gr", seq_len(ngrPL))))

  # Output:
  return(list(true_coefs = list(coefs_cont = coefs_cont,
                                coefs_grPL = coefs_grPL,
                                coefs_grGL = coefs_grGL),
              dat = dat_train,
              fml = fml_sim,
              dat_indep = dat_test))
}
set.seed(856824715)
sim_out <- sim_cumul()

Reference model

options(mc.cores = parallel::detectCores(logical = FALSE))
refm_fit <- brms::brm(
  formula = sim_out$fml,
  data = sim_out$dat,
  family = brms::cumulative(),
  prior = brms::set_prior("normal(0, 5)"),
  seed = 286013, refresh = 0
)

Augmented-data projection

library(projpred)
refm_aug <- get_refmodel(refm_fit, brms_seed = 75772)
system.time(vs_aug <- varsel(
  refm_aug,
  d_test = list(
    data = sim_out$dat_indep,
    offset = rep(0, nrow(sim_out$dat_indep)),
    weights = rep(1, nrow(sim_out$dat_indep)),
    y = sim_out$dat_indep$Y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  seed = 95930
))
##   user  system elapsed
## 21.613   0.095  21.688
smmry_aug <- summary(
  vs_aug,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_aug, digits = 3)
## 
## Family: cumulative 
## Link function: logit 
## Threshold: flexible 
## 
## Formula: Y ~ Xcont1 + Xcont2 + Xcont3 + Xcont4 + Xcont5 + XgrPL + (1 | 
##     XgrGL)
## Observations (training set): 100
## Observations (test set): 100
## Projection method: augmented-data
## Search method: forward, maximum number of terms 7
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
## 
## Selection Summary:
##  size solution_terms   mlpd    se  lower  upper   diff diff.se
##     0           <NA> -1.066 0.033 -1.099 -1.033 -0.406   0.060
##     1          XgrPL -0.978 0.055 -1.033 -0.923 -0.319   0.053
##     2         Xcont1 -0.949 0.071 -1.019 -0.878 -0.289   0.055
##     3         Xcont5 -0.787 0.069 -0.856 -0.719 -0.128   0.047
##     4         Xcont4 -0.713 0.074 -0.787 -0.639 -0.053   0.045
##     5         Xcont2 -0.641 0.066 -0.707 -0.575  0.019   0.027
##     6         Xcont3 -0.647 0.066 -0.712 -0.581  0.013   0.028
##     7    (1 | XgrGL) -0.647 0.066 -0.712 -0.581  0.013   0.028

The group-level term gets selected last, in particular after Xcont3 which has a true coefficient of zero (see sim_out$true_coefs$coefs_cont). Furthermore, there is no improvement in MLPD from size 6 to size 7 (where the group-level term gets selected).

The problem can also be seen quite well when projecting onto the full model and inspecting the projected draws for the hierarchical standard deviation:

# Check the projected hierarchical standard deviation in a projection onto the
# full model:
terms_full <- labels(terms(formula(refm_aug)))
terms_full <- sub("(.*\\|.*)", "(\\1)", terms_full)
### This would correspond to the search (at least in terms of the number of
### clusters; the different seed might lead to a slightly different clustering):
# prj_full <- project(refm_aug, solution_terms = terms_full,
#                     nclusters = 20,
#                     seed = 273722)
###
# Here, we use more clusters to show that the problem persists even then:
prj_full <- project(refm_aug, solution_terms = terms_full,
                    nclusters = 450,
                    seed = 273722)
### Not run here (but tried it out once): The problem also persists when using
### the draw-by-draw projection (however, a lot of warnings are thrown in that
### case, which can probably be alleviated---although probably not removed
### entirely---by setting `nAGQ = 10` (or higher) and/or
### `control = ordinal::clmm.control(maxIter = 200, maxLineIter = 200)`):
# prj_full <- project(refm_aug, solution_terms = terms_full,
#                     ndraws = length(refm_aug$wsample),
#                     seed = 273722)
###
prjmat_full <- as.matrix(prj_full)
quantile(prjmat_full[, "sd_XgrGL__Intercept"],
         probs = c(0.8, 0.9, 0.95, 0.99, 1))
#          80%          90%          95%          99%         100%
# 7.278300e-05 8.778181e-05 1.183793e-04 4.240883e-01 9.622590e-01
library(ggplot2)
ggplot(data = data.frame(prjmat_full[, "sd_XgrGL__Intercept", drop = FALSE]),
       mapping = aes(x = sd_XgrGL__Intercept)) +
  geom_histogram()

hierarch_sd Basically all projected draws for the hierarchical standard deviation are very close to zero, with only a few exceptions.

Latent projection

refm_lat <- get_refmodel(refm_fit, latent = TRUE, brms_seed = 75772)
system.time(vs_lat <- varsel(
  refm_lat,
  d_test = list(
    data = sim_out$dat_indep,
    offset = rep(0, nrow(sim_out$dat_indep)),
    weights = rep(1, nrow(sim_out$dat_indep)),
    ### Here, we are not interested in latent-scale post-processing, so we can
    ### set element `y` to a vector of `NA`s:
    y = rep(NA, nrow(sim_out$dat_indep)),
    ###
    y_oscale = sim_out$dat_indep$Y
  ),
  ### Only for the sake of speed (not recommended in general):
  nclusters_pred = 20,
  ###
  seed = 95930
))
##   user  system elapsed
## 20.076   0.024  20.090
smmry_lat <- summary(
  vs_lat,
  stats = "mlpd",
  type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
##
## Family: cumulative
## Link function: logit
##
## ------
## Latent-scale family:
##
## Family: gaussian
## Link function: identity
##
## ------
## Formula: .Y ~ Xcont1 + Xcont2 + Xcont3 + Xcont4 + Xcont5 + XgrPL + (1 |
##     XgrGL)
## Observations (training set): 100
## Observations (test set): 100
## Projection method: latent
## Search method: forward, maximum number of terms 7
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
##
## Selection Summary (response scale):
##  size solution_terms   mlpd    se  lower  upper   diff diff.se
##     0           <NA> -1.638 0.093 -1.730 -1.545 -0.978   0.130
##     1    (1 | XgrGL) -1.548 0.100 -1.648 -1.448 -0.889   0.127
##     2          XgrPL -1.281 0.143 -1.424 -1.138 -0.621   0.130
##     3         Xcont1 -1.264 0.149 -1.413 -1.115 -0.604   0.127
##     4         Xcont5 -1.024 0.132 -1.156 -0.892 -0.364   0.101
##     5         Xcont4 -0.865 0.112 -0.977 -0.753 -0.206   0.075
##     6         Xcont2 -0.729 0.101 -0.830 -0.627 -0.069   0.041
##     7         Xcont3 -0.720 0.104 -0.824 -0.616 -0.061   0.042

Now the group-level term is selected first which makes more sense (given the data-generating process used here) than the augmented-data projection's solution path. Also, there is an improvement in MLPD from size 0 to size 1 (where the group-level term gets selected).

Session info

Run with projpred 2.4.0. Details:

> sessioninfo::session_info()
─ Session info ──────────────────────────────────────────
 setting  value
 version  R version 4.2.3 (2023-03-15)
 os       Ubuntu 22.04.2 LTS
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2023-04-01
 rstudio  2023.03.0+386 Cherry Blossom (desktop)
 pandoc   2.19.2 @ [...] (via rmarkdown)

─ Packages ───────────────────────────────────────────
 package        * version    date (UTC) lib source
 abind            1.4-5      2016-07-21 [3] CRAN (R 4.2.0)
 backports        1.4.1      2021-12-13 [3] CRAN (R 4.2.0)
 base64enc        0.1-3      2015-07-28 [3] CRAN (R 4.0.2)
 bayesplot        1.10.0     2022-11-16 [3] CRAN (R 4.2.2)
 boot             1.3-28.1   2022-11-22 [1] CRAN (R 4.2.3)
 bridgesampling   1.1-2      2021-04-16 [3] CRAN (R 4.2.0)
 brms             2.19.0     2023-03-14 [3] CRAN (R 4.2.3)
 Brobdingnag      1.2-9      2022-10-19 [3] CRAN (R 4.2.1)
 callr            3.7.3      2022-11-02 [3] CRAN (R 4.2.2)
 checkmate        2.1.0      2022-04-21 [3] CRAN (R 4.2.0)
 cli              3.6.1      2023-03-23 [3] CRAN (R 4.2.3)
 cmdstanr         0.5.3      2022-08-24 [1] local
 coda             0.19-4     2020-09-30 [3] CRAN (R 4.2.0)
 codetools        0.2-19     2023-02-01 [4] CRAN (R 4.2.2)
 colorspace       2.1-0      2023-01-23 [3] CRAN (R 4.2.2)
 colourpicker     1.2.0      2022-10-28 [3] CRAN (R 4.2.2)
 crayon           1.5.2      2022-09-29 [3] CRAN (R 4.2.1)
 crosstalk        1.2.0      2021-11-04 [3] CRAN (R 4.1.2)
 data.table       1.14.8     2023-02-17 [3] CRAN (R 4.2.2)
 DBI              1.1.3      2022-06-18 [3] CRAN (R 4.2.1)
 digest           0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
 distributional   0.3.2      2023-03-22 [3] CRAN (R 4.2.3)
 dplyr            1.1.1      2023-03-22 [3] CRAN (R 4.2.3)
 DT               0.27       2023-01-17 [3] CRAN (R 4.2.2)
 dygraphs         1.1.1.6    2018-07-11 [3] CRAN (R 4.0.1)
 ellipsis         0.3.2      2021-04-29 [3] CRAN (R 4.1.1)
 emmeans          1.8.5      2023-03-08 [3] CRAN (R 4.2.2)
 estimability     1.4.1      2022-08-05 [3] CRAN (R 4.2.1)
 evaluate         0.20       2023-01-17 [3] CRAN (R 4.2.2)
 fansi            1.0.4      2023-01-22 [3] CRAN (R 4.2.2)
 farver           2.1.1      2022-07-06 [3] CRAN (R 4.2.1)
 fastmap          1.1.1      2023-02-24 [3] CRAN (R 4.2.2)
 foreach          1.5.2      2022-02-02 [3] CRAN (R 4.2.0)
 gamm4            0.2-6      2020-04-03 [3] CRAN (R 4.0.2)
 generics         0.1.3      2022-07-05 [3] CRAN (R 4.2.1)
 ggplot2        * 3.4.1      2023-02-10 [3] CRAN (R 4.2.2)
 glue             1.6.2      2022-02-24 [3] CRAN (R 4.2.0)
 gridExtra        2.3        2017-09-09 [3] CRAN (R 4.0.1)
 gtable           0.3.3      2023-03-21 [3] CRAN (R 4.2.3)
 gtools           3.9.4      2022-11-27 [1] CRAN (R 4.2.2)
 htmltools        0.5.5      2023-03-23 [3] CRAN (R 4.2.3)
 htmlwidgets      1.6.2      2023-03-17 [3] CRAN (R 4.2.3)
 httpuv           1.6.9      2023-02-14 [3] CRAN (R 4.2.2)
 igraph           1.4.1      2023-02-24 [3] CRAN (R 4.2.2)
 inline           0.3.19     2021-05-31 [3] CRAN (R 4.1.1)
 iterators        1.0.14     2022-02-05 [3] CRAN (R 4.2.0)
 jsonlite         1.8.4      2022-12-06 [1] CRAN (R 4.2.2)
 knitr            1.42       2023-01-25 [3] CRAN (R 4.2.2)
 labeling         0.4.2      2020-10-20 [3] CRAN (R 4.2.0)
 later            1.3.0      2021-08-18 [3] CRAN (R 4.1.1)
 lattice          0.20-45    2021-09-22 [4] CRAN (R 4.2.0)
 lifecycle        1.0.3      2022-10-07 [3] CRAN (R 4.2.1)
 lme4             1.1-32     2023-03-14 [3] CRAN (R 4.2.3)
 loo              2.5.1      2022-03-24 [3] CRAN (R 4.2.0)
 magrittr         2.0.3      2022-03-30 [3] CRAN (R 4.2.0)
 markdown         1.5        2023-01-31 [3] CRAN (R 4.2.2)
 MASS             7.3-58.3   2023-03-07 [1] CRAN (R 4.2.3)
 Matrix           1.5-3      2022-11-11 [4] CRAN (R 4.2.2)
 matrixStats      0.63.0     2022-11-18 [1] CRAN (R 4.2.2)
 mgcv             1.8-42     2023-03-02 [4] CRAN (R 4.2.3)
 mime             0.12       2021-09-28 [3] CRAN (R 4.2.0)
 miniUI           0.1.1.1    2018-05-18 [3] CRAN (R 4.0.1)
 minqa            1.2.5      2022-10-19 [3] CRAN (R 4.2.1)
 multcomp         1.4-23     2023-03-09 [3] CRAN (R 4.2.2)
 munsell          0.5.0      2018-06-12 [3] CRAN (R 4.0.1)
 mvtnorm          1.1-3      2021-10-08 [3] CRAN (R 4.2.0)
 nlme             3.1-162    2023-01-31 [4] CRAN (R 4.2.2)
 nloptr           2.0.3      2022-05-26 [3] CRAN (R 4.2.0)
 numDeriv         2016.8-1.1 2019-06-06 [3] CRAN (R 4.0.1)
 ordinal          2022.11-16 2022-11-16 [3] CRAN (R 4.2.2)
 pillar           1.9.0      2023-03-22 [3] CRAN (R 4.2.3)
 pkgbuild         1.4.0      2022-11-27 [3] CRAN (R 4.2.2)
 pkgconfig        2.0.3      2019-09-22 [3] CRAN (R 4.0.1)
 plyr             1.8.8      2022-11-11 [3] CRAN (R 4.2.2)
 posterior        1.4.1      2023-03-14 [3] CRAN (R 4.2.3)
 prettyunits      1.1.1      2020-01-24 [3] CRAN (R 4.0.1)
 processx         3.8.0      2022-10-26 [3] CRAN (R 4.2.1)
 projpred       * 2.4.0      2023-02-12 [1] CRAN (R 4.2.3)
 promises         1.2.0.1    2021-02-11 [3] CRAN (R 4.1.1)
 ps               1.7.3      2023-03-21 [3] CRAN (R 4.2.3)
 R6               2.5.1      2021-08-19 [3] CRAN (R 4.2.0)
 Rcpp             1.0.10     2023-01-22 [3] CRAN (R 4.2.2)
 RcppParallel     5.1.7      2023-02-27 [3] CRAN (R 4.2.2)
 reshape2         1.4.4      2020-04-09 [3] CRAN (R 4.0.1)
 rlang            1.1.0      2023-03-14 [3] CRAN (R 4.2.3)
 rmarkdown        2.21       2023-03-26 [1] CRAN (R 4.2.3)
 rsconnect        0.8.29     2023-01-09 [1] CRAN (R 4.2.2)
 rstan            2.21.8     2023-01-17 [1] CRAN (R 4.2.2)
 rstantools       2.3.0      2023-03-09 [3] CRAN (R 4.2.2)
 rstudioapi       0.14       2022-08-22 [3] CRAN (R 4.2.1)
 sandwich         3.0-2      2022-06-15 [3] CRAN (R 4.2.1)
 scales           1.2.1      2022-08-20 [3] CRAN (R 4.2.1)
 sessioninfo      1.2.2      2021-12-06 [3] CRAN (R 4.2.0)
 shiny            1.7.4      2022-12-15 [1] CRAN (R 4.2.2)
 shinyjs          2.1.0      2021-12-23 [3] CRAN (R 4.2.0)
 shinystan        2.6.0      2022-03-03 [3] CRAN (R 4.2.0)
 shinythemes      1.2.0      2021-01-25 [3] CRAN (R 4.0.3)
 StanHeaders      2.21.0-7   2020-12-17 [1] CRAN (R 4.2.0)
 stringi          1.7.12     2023-01-11 [3] CRAN (R 4.2.2)
 stringr          1.5.0      2022-12-02 [1] CRAN (R 4.2.2)
 survival         3.5-5      2023-03-12 [1] CRAN (R 4.2.3)
 tensorA          0.36.2     2020-11-19 [3] CRAN (R 4.1.1)
 TH.data          1.1-1      2022-04-26 [3] CRAN (R 4.2.0)
 threejs          0.3.3      2020-01-21 [3] CRAN (R 4.0.1)
 tibble           3.2.1      2023-03-20 [3] CRAN (R 4.2.3)
 tidyselect       1.2.0      2022-10-10 [3] CRAN (R 4.2.1)
 ucminf           1.1-4.1    2022-09-29 [3] CRAN (R 4.2.1)
 utf8             1.2.3      2023-01-31 [3] CRAN (R 4.2.2)
 vctrs            0.6.1      2023-03-22 [3] CRAN (R 4.2.3)
 withr            2.5.0      2022-03-03 [3] CRAN (R 4.2.0)
 xfun             0.38       2023-03-24 [3] CRAN (R 4.2.3)
 xtable           1.8-4      2019-04-21 [3] CRAN (R 4.0.1)
 xts              0.13.0     2023-02-20 [3] CRAN (R 4.2.2)
 yaml             2.3.7      2023-01-23 [3] CRAN (R 4.2.2)
 zoo              1.8-11     2022-09-17 [3] CRAN (R 4.2.1)

 [1] [...]
 [2] [...]
 [3] [...]
 [4] [...]

─────────────────────────────────────────────────

The CmdStan version is 2.31.0.

stan-dev / projpred

Optional submodel fitter `MASS::glmmPQL()` #353

Binomial (Bernoulli) demo

Data

Reference model

Default GLMM submodel fitter `lme4::glmer()`

New GLMM submodel fitter `MASS::glmmPQL()`

Poisson demo

Data

Reference model

Default GLMM submodel fitter `lme4::glmer()`

New GLMM submodel fitter `MASS::glmmPQL()`

Binomial (Bernoulli) demo

Poisson demo

Session info

Data

Reference model

Augmented-data projection

Latent projection

Session info

stan-dev / projpred

Optional submodel fitter `MASS::glmmPQL()` #353

Binomial (Bernoulli) demo

Data

Reference model

Default GLMM submodel fitter lme4::glmer()

New GLMM submodel fitter MASS::glmmPQL()

Poisson demo

Data

Reference model

Default GLMM submodel fitter lme4::glmer()

New GLMM submodel fitter MASS::glmmPQL()

Binomial (Bernoulli) demo

Poisson demo

Session info

Data

Reference model

Augmented-data projection

Latent projection

Session info

Default GLMM submodel fitter `lme4::glmer()`

New GLMM submodel fitter `MASS::glmmPQL()`

Default GLMM submodel fitter `lme4::glmer()`

New GLMM submodel fitter `MASS::glmmPQL()`