Closed fweber144 closed 2 years ago
Oh, the fact that multiple objects are named cvvs_[...]
does not have any meaning. In a previous version of the code, I was using cv_varsel()
and forgot to update the names after switching to varsel()
with d_test
.
Until the new submodel fitter MASS::glmmPQL()
is implemented completely, the latent projection merged in PR #372 provides an alternative (approximate) solution to the two problems illustrated above (invalid results for binomial GLMMs in case of the traditional projection; high runtime for Poisson GLMMs in case of the traditional projection):
Run the following code after having run the code from the Bernoulli sections "Data" and "Reference model" from above:
library(projpred)
system.time(cvvs_lat <- varsel(
refm_fit_brnll,
d_test = list(
data = dat_brnll_test,
offset = rep(0, nrow(dat_brnll_test)),
weights = rep(1, nrow(dat_brnll_test)),
### Here, we are not interested in latent-scale post-processing, so we can
### set element `y` to a vector of `NA`s:
y = rep(NA, nrow(dat_brnll_test)),
###
y_oscale = dat_brnll_test$y
),
### Only for the sake of speed (not recommended in general):
nclusters_pred = 20,
###
seed = 95930,
latent = TRUE
))
## user system elapsed
## 237.816 0.367 238.043
smmry_lat <- summary(
cvvs_lat,
stats = "mlpd",
type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
##
## Family: binomial
## Link function: logit
##
## ------
## Latent-scale family:
##
## Family: gaussian
## Link function: identity
##
## ------
## Formula: .y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 +
## x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 +
## xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 +
## xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 +
## xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 +
## xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 100
## Observations (test set): 100
## Projection method: latent
## Search method: forward, maximum number of terms 19
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
##
## Selection Summary (response scale):
## size solution_terms mlpd se lower upper diff diff.se
## 0 <NA> -0.730 0.018 -0.748 -0.712 -0.248 0.044
## 1 x6 -0.763 0.078 -0.841 -0.685 -0.281 0.067
## 2 x7 -0.708 0.084 -0.792 -0.624 -0.226 0.068
## 3 x3 -0.624 0.072 -0.696 -0.552 -0.142 0.054
## 4 x2 -0.572 0.069 -0.640 -0.503 -0.090 0.049
## 5 x9 -0.487 0.061 -0.548 -0.427 -0.005 0.036
## 6 x4 -0.465 0.059 -0.524 -0.405 0.017 0.031
## 7 xn.1 -0.473 0.060 -0.533 -0.413 0.009 0.030
## 8 xn.20 -0.474 0.060 -0.534 -0.414 0.008 0.027
## 9 (1 | grpGL) -0.465 0.059 -0.524 -0.406 0.017 0.027
## 10 xn.27 -0.466 0.061 -0.528 -0.405 0.016 0.027
## 11 xn.7 -0.473 0.062 -0.536 -0.411 0.009 0.027
## 12 xn.38 -0.482 0.063 -0.545 -0.418 0.000 0.026
## 13 xn.18 -0.483 0.064 -0.547 -0.419 -0.001 0.026
## 14 x1 -0.489 0.065 -0.553 -0.424 -0.007 0.026
## 15 xn.16 -0.490 0.065 -0.555 -0.425 -0.008 0.026
## 16 xn.17 -0.489 0.065 -0.554 -0.425 -0.007 0.026
## 17 xn.5 -0.488 0.064 -0.552 -0.424 -0.006 0.025
## 18 xn.12 -0.487 0.064 -0.551 -0.423 -0.006 0.025
## 19 x5 -0.488 0.064 -0.552 -0.423 -0.006 0.025
We see that the latent-projection solution path is similar to the traditional-projection one if the latter uses the new GLMM submodel fitter MASS::glmmPQL()
. Most importantly, the group-level term occurs comparably early (size 9), but after two noise terms (which it didn't in case of MASS::glmmPQL()
). In this sense, the latent projection gives still suboptimal results, but already better ones than the traditional projection with lme4::glmer()
as submodel fitter.
With respect to the performance evaluation results (on response scale, not latent scale), the latent projection gives a different (slightly more unstable) picture than the two traditional-projection approaches:
### Requires to also run the projpred code from
### <https://github.com/stan-dev/projpred/pull/353#issue-1390738883>:
plot(cvvs_trad, stats = "mlpd", deltas = TRUE)
plot(cvvs_PQLtrad, stats = "mlpd", deltas = TRUE)
###
plot(cvvs_lat, stats = "mlpd", deltas = TRUE)
The slightly more unstable latent-projection performance evaluation results are still usable, however.
In conclusion (for this Bernoulli demo), until the implementation of the new traditional-projection GLMM submodel fitter MASS::glmmPQL()
is finished, the latent projection should be considered as an alternative to the currently implemented submodel fitter lme4::glmer()
.
Run the following code after having run the code from the Poisson sections "Data" and "Reference model" from above:
library(projpred)
system.time(cvvs_lat <- varsel(
refm_fit_poiss,
d_test = list(
data = dat_poiss_test,
offset = rep(0, nrow(dat_poiss_test)),
weights = rep(1, nrow(dat_poiss_test)),
### Here, we are not interested in latent-scale post-processing, so we can
### set element `y` to a vector of `NA`s:
y = rep(NA, nrow(dat_poiss_test)),
###
y_oscale = dat_poiss_test$y
),
### Only for the sake of speed (not recommended in general):
nclusters_pred = 20,
###
nterms_max = 5,
seed = 95930,
latent = TRUE
))
## user system elapsed
## 104.535 0.167 104.622
smmry_lat <- summary(
cvvs_lat,
stats = "mlpd",
type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
##
## Family: poisson
## Link function: log
##
## ------
## Latent-scale family:
##
## Family: gaussian
## Link function: identity
##
## ------
## Formula: .y ~ (1 | grpGL) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 +
## x10 + grpPL + xn.1 + xn.2 + xn.3 + xn.4 + xn.5 + xn.6 + xn.7 +
## xn.8 + xn.9 + xn.10 + xn.11 + xn.12 + xn.13 + xn.14 + xn.15 +
## xn.16 + xn.17 + xn.18 + xn.19 + xn.20 + xn.21 + xn.22 + xn.23 +
## xn.24 + xn.25 + xn.26 + xn.27 + xn.28 + xn.29 + xn.30 + xn.31 +
## xn.32 + xn.33 + xn.34 + xn.35 + xn.36 + xn.37 + xn.38 + xn.39
## Observations (training set): 71
## Observations (test set): 71
## Projection method: latent
## Search method: forward, maximum number of terms 5
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
##
## Selection Summary (response scale):
## size solution_terms mlpd se lower upper diff diff.se
## 0 <NA> -81.176 29.063 -110.240 -52.113 -78.198 28.914
## 1 (1 | grpGL) -52.793 18.015 -70.807 -34.778 -49.814 17.864
## 2 x5 -33.465 10.681 -44.146 -22.784 -30.486 10.529
## 3 x4 -11.435 3.878 -15.313 -7.557 -8.456 3.753
## 4 x6 -8.581 3.116 -11.697 -5.464 -5.602 3.006
## 5 x2 -6.914 2.394 -9.308 -4.519 -3.935 2.286
The solution path is the same as in the two traditional-projection variants (in particular, the group-level term is always selected first, so its relevance should never be underestimated), but the runtime is much lower than for the traditional projection with lme4::glmer()
.
Again, the performance evaluation results deviate from the two traditional-projection variants:
### Requires to also run the projpred code from
### <https://github.com/stan-dev/projpred/pull/353#issue-1390738883>:
plot(cvvs_trad, stats = "mlpd", deltas = TRUE)
plot(cvvs_PQLtrad, stats = "mlpd", deltas = TRUE)
###
plot(cvvs_lat, stats = "mlpd", deltas = TRUE)
But still, the latent-projection performance evaluation results are usable.
In conclusion (for this Poisson demo), the latent projection should be considered as a faster alternative to the currently implemented traditional-projection GLMM submodel fitter lme4::glmer()
which seems to give valid results, but takes an eternity.
This amendment was run with projpred 2.4.0, in contrast to above where the code was run at the state of the merged PR #353. This should affect results only very slightly, though. Furthermore, I ran this amendment code on a different machine where the runtime is approx. 90% of the runtime on the machine that I used above.
In case of a multilevel model, the augmented-data projection seems to suffer from the same issue as the traditional projection for the binomial family, which makes sense because of the possible explanation given in https://github.com/lme4/lme4/issues/682#issue-1242842076 and https://github.com/lme4/lme4/issues/682#issuecomment-1135533543 (a fractional response inherently comes with more similar groups). Again, the latent projection seems to be a remedy. Illustration:
# Number of observations in the training dataset (= number of observations in
# the test dataset):
N <- 100L
sim_cumul <- function(nobsv_sim = 2 * N, ncat = 3L, npreds_cont = 5L,
ngrPL = 3L, nobsv_per_grGL = 2L) {
yunq <- paste0("ycat", seq_len(ncat))
nthres <- ncat - 1L
thres <- seq(-1, 1, length.out = nthres)
# Continuous predictors:
dat_sim <- setNames(replicate(npreds_cont, {
rnorm(nobsv_sim, sd = 0.5)
}, simplify = FALSE), paste0("Xcont", seq_len(npreds_cont)))
dat_sim <- as.data.frame(dat_sim)
coefs_cont <- seq(-4, 4, length.out = npreds_cont)
eta <- drop(as.matrix(dat_sim) %*% coefs_cont)
# Population-level (PL) categorical predictor:
dat_sim$XgrPL <- gl(n = ngrPL, k = nobsv_sim %/% ngrPL, length = nobsv_sim,
labels = paste0("gr", seq_len(ngrPL)))
# Shuffle the PL categorical predictor to avoid missing levels in the training
# or the test dataset:
dat_sim$XgrPL <- sample(dat_sim$XgrPL)
coefs_grPL <- c(0, seq(-2, 2, length.out = ngrPL - 1L))
eta <- eta + coefs_grPL[as.numeric(dat_sim$XgrPL)]
# Group-level (GL) intercepts:
ngrGL <- c(nobsv_sim %/% nobsv_per_grGL)
dat_sim$XgrGL <- gl(n = ngrGL, k = nobsv_sim %/% ngrGL, length = nobsv_sim,
labels = paste0("gr", seq_len(ngrGL)))
# Shuffle the GL categorical predictor to avoid a systematic structure in
# combination with the PL categorical predictor:
dat_sim$XgrGL <- sample(dat_sim$XgrGL)
coefs_grGL <- rnorm(ngrGL, sd = 1.5)
eta <- eta + coefs_grGL[dat_sim$XgrGL]
# Calculate the probabilities for the response categories:
thres_eta <- sapply(thres, function(thres_k) {
thres_k - eta
})
# Shouldn't be necessary (just to be safe): Emulate a single posterior draw:
dim(thres_eta) <- c(1L, nobsv_sim, nthres)
yprobs <- brms:::inv_link_cumulative(thres_eta, link = "logit")
# Because of emulating a single posterior draw above:
dim(yprobs) <- c(nobsv_sim, ncat)
# Response:
dat_sim$Y <- factor(sapply(seq_len(nobsv_sim), function(i) {
sample(yunq, size = 1, prob = yprobs[i, ])
}), ordered = TRUE)
# Formula:
voutc <- "Y"
vpreds <- grep("^Xcont|^XgrPL", names(dat_sim), value = TRUE)
vpreds_GL <- grep("^XgrGL", names(dat_sim), value = TRUE)
if (length(vpreds_GL) > 0L) {
stopifnot(all(lengths(coefs_grGL) == 1))
vpreds_GL <- paste0("(1 | ", vpreds_GL, ")")
}
fml_sim <- as.formula(paste(
voutc, "~", paste(c(vpreds, vpreds_GL), collapse = " + ")
))
# Check response categories:
dat_train <- dat_sim[1:(nobsv_sim %/% 2), , drop = FALSE]
dat_test <- dat_sim[(nobsv_sim %/% 2 + 1):nobsv_sim, , drop = FALSE]
# Check that all response categories are present in training and test dataset:
stopifnot(identical(levels(droplevels(dat_train$Y)), yunq),
identical(levels(droplevels(dat_test$Y)), yunq))
# Check that all levels of the PL categorical predictor are present in
# training and test dataset:
stopifnot(identical(levels(droplevels(dat_train$XgrPL)),
paste0("gr", seq_len(ngrPL))),
identical(levels(droplevels(dat_test$XgrPL)),
paste0("gr", seq_len(ngrPL))))
# Output:
return(list(true_coefs = list(coefs_cont = coefs_cont,
coefs_grPL = coefs_grPL,
coefs_grGL = coefs_grGL),
dat = dat_train,
fml = fml_sim,
dat_indep = dat_test))
}
set.seed(856824715)
sim_out <- sim_cumul()
options(mc.cores = parallel::detectCores(logical = FALSE))
refm_fit <- brms::brm(
formula = sim_out$fml,
data = sim_out$dat,
family = brms::cumulative(),
prior = brms::set_prior("normal(0, 5)"),
seed = 286013, refresh = 0
)
library(projpred)
refm_aug <- get_refmodel(refm_fit, brms_seed = 75772)
system.time(vs_aug <- varsel(
refm_aug,
d_test = list(
data = sim_out$dat_indep,
offset = rep(0, nrow(sim_out$dat_indep)),
weights = rep(1, nrow(sim_out$dat_indep)),
y = sim_out$dat_indep$Y
),
### Only for the sake of speed (not recommended in general):
nclusters_pred = 20,
###
seed = 95930
))
## user system elapsed
## 21.613 0.095 21.688
smmry_aug <- summary(
vs_aug,
stats = "mlpd",
type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_aug, digits = 3)
##
## Family: cumulative
## Link function: logit
## Threshold: flexible
##
## Formula: Y ~ Xcont1 + Xcont2 + Xcont3 + Xcont4 + Xcont5 + XgrPL + (1 |
## XgrGL)
## Observations (training set): 100
## Observations (test set): 100
## Projection method: augmented-data
## Search method: forward, maximum number of terms 7
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
##
## Selection Summary:
## size solution_terms mlpd se lower upper diff diff.se
## 0 <NA> -1.066 0.033 -1.099 -1.033 -0.406 0.060
## 1 XgrPL -0.978 0.055 -1.033 -0.923 -0.319 0.053
## 2 Xcont1 -0.949 0.071 -1.019 -0.878 -0.289 0.055
## 3 Xcont5 -0.787 0.069 -0.856 -0.719 -0.128 0.047
## 4 Xcont4 -0.713 0.074 -0.787 -0.639 -0.053 0.045
## 5 Xcont2 -0.641 0.066 -0.707 -0.575 0.019 0.027
## 6 Xcont3 -0.647 0.066 -0.712 -0.581 0.013 0.028
## 7 (1 | XgrGL) -0.647 0.066 -0.712 -0.581 0.013 0.028
The group-level term gets selected last, in particular after Xcont3
which has a true coefficient of zero (see sim_out$true_coefs$coefs_cont
). Furthermore, there is no improvement in MLPD from size 6 to size 7 (where the group-level term gets selected).
The problem can also be seen quite well when projecting onto the full model and inspecting the projected draws for the hierarchical standard deviation:
# Check the projected hierarchical standard deviation in a projection onto the
# full model:
terms_full <- labels(terms(formula(refm_aug)))
terms_full <- sub("(.*\\|.*)", "(\\1)", terms_full)
### This would correspond to the search (at least in terms of the number of
### clusters; the different seed might lead to a slightly different clustering):
# prj_full <- project(refm_aug, solution_terms = terms_full,
# nclusters = 20,
# seed = 273722)
###
# Here, we use more clusters to show that the problem persists even then:
prj_full <- project(refm_aug, solution_terms = terms_full,
nclusters = 450,
seed = 273722)
### Not run here (but tried it out once): The problem also persists when using
### the draw-by-draw projection (however, a lot of warnings are thrown in that
### case, which can probably be alleviated---although probably not removed
### entirely---by setting `nAGQ = 10` (or higher) and/or
### `control = ordinal::clmm.control(maxIter = 200, maxLineIter = 200)`):
# prj_full <- project(refm_aug, solution_terms = terms_full,
# ndraws = length(refm_aug$wsample),
# seed = 273722)
###
prjmat_full <- as.matrix(prj_full)
quantile(prjmat_full[, "sd_XgrGL__Intercept"],
probs = c(0.8, 0.9, 0.95, 0.99, 1))
# 80% 90% 95% 99% 100%
# 7.278300e-05 8.778181e-05 1.183793e-04 4.240883e-01 9.622590e-01
library(ggplot2)
ggplot(data = data.frame(prjmat_full[, "sd_XgrGL__Intercept", drop = FALSE]),
mapping = aes(x = sd_XgrGL__Intercept)) +
geom_histogram()
Basically all projected draws for the hierarchical standard deviation are very close to zero, with only a few exceptions.
refm_lat <- get_refmodel(refm_fit, latent = TRUE, brms_seed = 75772)
system.time(vs_lat <- varsel(
refm_lat,
d_test = list(
data = sim_out$dat_indep,
offset = rep(0, nrow(sim_out$dat_indep)),
weights = rep(1, nrow(sim_out$dat_indep)),
### Here, we are not interested in latent-scale post-processing, so we can
### set element `y` to a vector of `NA`s:
y = rep(NA, nrow(sim_out$dat_indep)),
###
y_oscale = sim_out$dat_indep$Y
),
### Only for the sake of speed (not recommended in general):
nclusters_pred = 20,
###
seed = 95930
))
## user system elapsed
## 20.076 0.024 20.090
smmry_lat <- summary(
vs_lat,
stats = "mlpd",
type = c("mean", "se", "lower", "upper", "diff", "diff.se")
)
print(smmry_lat, digits = 3)
## ------
## Response-scale family:
##
## Family: cumulative
## Link function: logit
##
## ------
## Latent-scale family:
##
## Family: gaussian
## Link function: identity
##
## ------
## Formula: .Y ~ Xcont1 + Xcont2 + Xcont3 + Xcont4 + Xcont5 + XgrPL + (1 |
## XgrGL)
## Observations (training set): 100
## Observations (test set): 100
## Projection method: latent
## Search method: forward, maximum number of terms 7
## Number of clusters used for selection: 20
## Number of clusters used for prediction: 20
##
## Selection Summary (response scale):
## size solution_terms mlpd se lower upper diff diff.se
## 0 <NA> -1.638 0.093 -1.730 -1.545 -0.978 0.130
## 1 (1 | XgrGL) -1.548 0.100 -1.648 -1.448 -0.889 0.127
## 2 XgrPL -1.281 0.143 -1.424 -1.138 -0.621 0.130
## 3 Xcont1 -1.264 0.149 -1.413 -1.115 -0.604 0.127
## 4 Xcont5 -1.024 0.132 -1.156 -0.892 -0.364 0.101
## 5 Xcont4 -0.865 0.112 -0.977 -0.753 -0.206 0.075
## 6 Xcont2 -0.729 0.101 -0.830 -0.627 -0.069 0.041
## 7 Xcont3 -0.720 0.104 -0.824 -0.616 -0.061 0.042
Now the group-level term is selected first which makes more sense (given the data-generating process used here) than the augmented-data projection's solution path. Also, there is an improvement in MLPD from size 0 to size 1 (where the group-level term gets selected).
Run with projpred 2.4.0. Details:
> sessioninfo::session_info()
─ Session info ──────────────────────────────────────────
setting value
version R version 4.2.3 (2023-03-15)
os Ubuntu 22.04.2 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Berlin
date 2023-04-01
rstudio 2023.03.0+386 Cherry Blossom (desktop)
pandoc 2.19.2 @ [...] (via rmarkdown)
─ Packages ───────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [3] CRAN (R 4.2.0)
backports 1.4.1 2021-12-13 [3] CRAN (R 4.2.0)
base64enc 0.1-3 2015-07-28 [3] CRAN (R 4.0.2)
bayesplot 1.10.0 2022-11-16 [3] CRAN (R 4.2.2)
boot 1.3-28.1 2022-11-22 [1] CRAN (R 4.2.3)
bridgesampling 1.1-2 2021-04-16 [3] CRAN (R 4.2.0)
brms 2.19.0 2023-03-14 [3] CRAN (R 4.2.3)
Brobdingnag 1.2-9 2022-10-19 [3] CRAN (R 4.2.1)
callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2)
checkmate 2.1.0 2022-04-21 [3] CRAN (R 4.2.0)
cli 3.6.1 2023-03-23 [3] CRAN (R 4.2.3)
cmdstanr 0.5.3 2022-08-24 [1] local
coda 0.19-4 2020-09-30 [3] CRAN (R 4.2.0)
codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2)
colorspace 2.1-0 2023-01-23 [3] CRAN (R 4.2.2)
colourpicker 1.2.0 2022-10-28 [3] CRAN (R 4.2.2)
crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1)
crosstalk 1.2.0 2021-11-04 [3] CRAN (R 4.1.2)
data.table 1.14.8 2023-02-17 [3] CRAN (R 4.2.2)
DBI 1.1.3 2022-06-18 [3] CRAN (R 4.2.1)
digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
distributional 0.3.2 2023-03-22 [3] CRAN (R 4.2.3)
dplyr 1.1.1 2023-03-22 [3] CRAN (R 4.2.3)
DT 0.27 2023-01-17 [3] CRAN (R 4.2.2)
dygraphs 1.1.1.6 2018-07-11 [3] CRAN (R 4.0.1)
ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1)
emmeans 1.8.5 2023-03-08 [3] CRAN (R 4.2.2)
estimability 1.4.1 2022-08-05 [3] CRAN (R 4.2.1)
evaluate 0.20 2023-01-17 [3] CRAN (R 4.2.2)
fansi 1.0.4 2023-01-22 [3] CRAN (R 4.2.2)
farver 2.1.1 2022-07-06 [3] CRAN (R 4.2.1)
fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2)
foreach 1.5.2 2022-02-02 [3] CRAN (R 4.2.0)
gamm4 0.2-6 2020-04-03 [3] CRAN (R 4.0.2)
generics 0.1.3 2022-07-05 [3] CRAN (R 4.2.1)
ggplot2 * 3.4.1 2023-02-10 [3] CRAN (R 4.2.2)
glue 1.6.2 2022-02-24 [3] CRAN (R 4.2.0)
gridExtra 2.3 2017-09-09 [3] CRAN (R 4.0.1)
gtable 0.3.3 2023-03-21 [3] CRAN (R 4.2.3)
gtools 3.9.4 2022-11-27 [1] CRAN (R 4.2.2)
htmltools 0.5.5 2023-03-23 [3] CRAN (R 4.2.3)
htmlwidgets 1.6.2 2023-03-17 [3] CRAN (R 4.2.3)
httpuv 1.6.9 2023-02-14 [3] CRAN (R 4.2.2)
igraph 1.4.1 2023-02-24 [3] CRAN (R 4.2.2)
inline 0.3.19 2021-05-31 [3] CRAN (R 4.1.1)
iterators 1.0.14 2022-02-05 [3] CRAN (R 4.2.0)
jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.2)
knitr 1.42 2023-01-25 [3] CRAN (R 4.2.2)
labeling 0.4.2 2020-10-20 [3] CRAN (R 4.2.0)
later 1.3.0 2021-08-18 [3] CRAN (R 4.1.1)
lattice 0.20-45 2021-09-22 [4] CRAN (R 4.2.0)
lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1)
lme4 1.1-32 2023-03-14 [3] CRAN (R 4.2.3)
loo 2.5.1 2022-03-24 [3] CRAN (R 4.2.0)
magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.2.0)
markdown 1.5 2023-01-31 [3] CRAN (R 4.2.2)
MASS 7.3-58.3 2023-03-07 [1] CRAN (R 4.2.3)
Matrix 1.5-3 2022-11-11 [4] CRAN (R 4.2.2)
matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.2.2)
mgcv 1.8-42 2023-03-02 [4] CRAN (R 4.2.3)
mime 0.12 2021-09-28 [3] CRAN (R 4.2.0)
miniUI 0.1.1.1 2018-05-18 [3] CRAN (R 4.0.1)
minqa 1.2.5 2022-10-19 [3] CRAN (R 4.2.1)
multcomp 1.4-23 2023-03-09 [3] CRAN (R 4.2.2)
munsell 0.5.0 2018-06-12 [3] CRAN (R 4.0.1)
mvtnorm 1.1-3 2021-10-08 [3] CRAN (R 4.2.0)
nlme 3.1-162 2023-01-31 [4] CRAN (R 4.2.2)
nloptr 2.0.3 2022-05-26 [3] CRAN (R 4.2.0)
numDeriv 2016.8-1.1 2019-06-06 [3] CRAN (R 4.0.1)
ordinal 2022.11-16 2022-11-16 [3] CRAN (R 4.2.2)
pillar 1.9.0 2023-03-22 [3] CRAN (R 4.2.3)
pkgbuild 1.4.0 2022-11-27 [3] CRAN (R 4.2.2)
pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.1)
plyr 1.8.8 2022-11-11 [3] CRAN (R 4.2.2)
posterior 1.4.1 2023-03-14 [3] CRAN (R 4.2.3)
prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1)
processx 3.8.0 2022-10-26 [3] CRAN (R 4.2.1)
projpred * 2.4.0 2023-02-12 [1] CRAN (R 4.2.3)
promises 1.2.0.1 2021-02-11 [3] CRAN (R 4.1.1)
ps 1.7.3 2023-03-21 [3] CRAN (R 4.2.3)
R6 2.5.1 2021-08-19 [3] CRAN (R 4.2.0)
Rcpp 1.0.10 2023-01-22 [3] CRAN (R 4.2.2)
RcppParallel 5.1.7 2023-02-27 [3] CRAN (R 4.2.2)
reshape2 1.4.4 2020-04-09 [3] CRAN (R 4.0.1)
rlang 1.1.0 2023-03-14 [3] CRAN (R 4.2.3)
rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.2.3)
rsconnect 0.8.29 2023-01-09 [1] CRAN (R 4.2.2)
rstan 2.21.8 2023-01-17 [1] CRAN (R 4.2.2)
rstantools 2.3.0 2023-03-09 [3] CRAN (R 4.2.2)
rstudioapi 0.14 2022-08-22 [3] CRAN (R 4.2.1)
sandwich 3.0-2 2022-06-15 [3] CRAN (R 4.2.1)
scales 1.2.1 2022-08-20 [3] CRAN (R 4.2.1)
sessioninfo 1.2.2 2021-12-06 [3] CRAN (R 4.2.0)
shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.2)
shinyjs 2.1.0 2021-12-23 [3] CRAN (R 4.2.0)
shinystan 2.6.0 2022-03-03 [3] CRAN (R 4.2.0)
shinythemes 1.2.0 2021-01-25 [3] CRAN (R 4.0.3)
StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.2.0)
stringi 1.7.12 2023-01-11 [3] CRAN (R 4.2.2)
stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
survival 3.5-5 2023-03-12 [1] CRAN (R 4.2.3)
tensorA 0.36.2 2020-11-19 [3] CRAN (R 4.1.1)
TH.data 1.1-1 2022-04-26 [3] CRAN (R 4.2.0)
threejs 0.3.3 2020-01-21 [3] CRAN (R 4.0.1)
tibble 3.2.1 2023-03-20 [3] CRAN (R 4.2.3)
tidyselect 1.2.0 2022-10-10 [3] CRAN (R 4.2.1)
ucminf 1.1-4.1 2022-09-29 [3] CRAN (R 4.2.1)
utf8 1.2.3 2023-01-31 [3] CRAN (R 4.2.2)
vctrs 0.6.1 2023-03-22 [3] CRAN (R 4.2.3)
withr 2.5.0 2022-03-03 [3] CRAN (R 4.2.0)
xfun 0.38 2023-03-24 [3] CRAN (R 4.2.3)
xtable 1.8-4 2019-04-21 [3] CRAN (R 4.0.1)
xts 0.13.0 2023-02-20 [3] CRAN (R 4.2.2)
yaml 2.3.7 2023-01-23 [3] CRAN (R 4.2.2)
zoo 1.8-11 2022-09-17 [3] CRAN (R 4.2.1)
[1] [...]
[2] [...]
[3] [...]
[4] [...]
─────────────────────────────────────────────────
The CmdStan version is 2.31.0.
This starts to address #207 by adding the submodel fitter
MASS::glmmPQL()
(see lme4/lme4#682 for the background on this) as a "hidden" feature. This is still a hidden feature because there are someTODO (glmmPQL)
comments left. This PR should be merged nonetheless because it can already provide some advantage (see the demonstrations below) and because it doesn't affect existing code (even the new dependencies MASS and nlme are not really extra dependencies because they are already dependencies of lme4, for example).Note that in general, PQL gives a solution that differs from the maximum likelihood solution, so PQL does not really solve the projection problem (although it can probably be considered as an approximation; but this is just a guess from my side). So it would probably be better to conduct some theoretical investigations first before un-hiding this feature (and the
TODO (glmmPQL)
comments need to be solved first as well).Binomial (Bernoulli) demo
With respect to the binomial part of #207, here is a short demo of the advantage of this PR.
Data
Reference model
Default GLMM submodel fitter
lme4::glmer()
First, we'll run
varsel()
(with argumentd_test
specified) using the default GLMM submodel fitterlme4::glmer()
:As can be seen from the
summary()
output, the group-level term is not among the first 19 terms of the solution path (but several noise terms are).New GLMM submodel fitter
MASS::glmmPQL()
Now, we activate option
projpred.PQL
to useMASS::glmmPQL()
as submodel fitter for GLMMs:Now, the group-level term is at position (size) 7 of the solution path (and it is correctly positioned before all noise terms). The fact that the
varsel()
run now takes about twice as long is due to the fact that the group-level term is selected so early, meaning that at all later submodel sizes, all submodels are GLMMs (which take longer to fit than GLMs). So the runtimes cannot really be compared. (I included them nonetheless to explain this.)Poisson demo
Now a short demo of the advantage of this PR with respect to the Poisson part of #207.
Data
Reference model
Default GLMM submodel fitter
lme4::glmer()
Again, we'll first run
varsel()
(with argumentd_test
specified) using the default GLMM submodel fitterlme4::glmer()
(we're now restrictingnterms_max
to a value of 5 because this is sufficient for our purposes here and because otherwise, the runtime would be even higher):As can be seen from the
summary()
output, the group-level term is selected first (in particular, before all noise terms). So in case of the Poisson family, the validity of the results when usinglme4::glmer()
as GLMM submodel fitter does not seem to be compromised (only the runtime, as we will see below).New GLMM submodel fitter
MASS::glmmPQL()
Now, we again activate option
projpred.PQL
to useMASS::glmmPQL()
as submodel fitter for GLMMs:The results are basically the same as above where
lme4::glmer()
was used as GLMM submodel fitter. In particular, the group-level term is at position (size) 1 of the solution path (and it is correctly positioned before all noise terms). However, the runtime is only about one tenth of that above wherelme4::glmer()
was used as GLMM submodel fitter.