Closed bbolker closed 11 months ago
It's mostly due to some old code, where ...
are not passed down to the gam-predict-method. Let's see if it's really that easy and I just need to add ...
to the predict()
call.
library(lme4)
#> Loading required package: Matrix
library(mgcv)
#> Loading required package: nlme
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#>
#> lmList
#> This is mgcv 1.9-0. For overview type 'help("mgcv-package")'.
library(ggeffects)
data("Contraception", package = "mlmRev")
Contraception <- transform(Contraception,
use_n = as.numeric(use) - 1,
age_sc = drop(scale(age)))
m_gam <- gam(use_n ~ poly(age_sc,2) + urban + s(district, bs = "re"),
family = binomial, data = Contraception, method = "ML")
m_glmer <- glmer(use_n ~ poly(age_sc,2) + urban + (1|district),
family = binomial, data = Contraception)
p0_gam <- ggpredict(m_gam,
terms = c("age_sc [all]", "urban"), exclude = "s(district)",
newdata.guaranteed = TRUE
)
p0_glmer <- ggpredict(m_glmer, terms = c("age_sc [all]", "urban"))
p0_gam[1:5, ]
#> # Predicted probabilities of use_n
#>
#> # urban = N
#>
#> age_sc | Predicted | 95% CI
#> ---------------------------------
#> -1.50 | 0.14 | [0.11, 0.19]
#> -1.39 | 0.17 | [0.13, 0.21]
#> -1.28 | 0.20 | [0.16, 0.24]
#>
#> # urban = Y
#>
#> age_sc | Predicted | 95% CI
#> ---------------------------------
#> -1.50 | 0.24 | [0.18, 0.31]
#> -1.39 | 0.28 | [0.22, 0.34]
#>
#> Adjusted for:
#> * district = 1
p0_glmer[1:5, ]
#> # Predicted probabilities of use_n
#>
#> # urban = N
#>
#> age_sc | Predicted | 95% CI
#> ---------------------------------
#> -1.50 | 0.14 | [0.11, 0.18]
#> -1.39 | 0.17 | [0.13, 0.21]
#> -1.28 | 0.19 | [0.16, 0.24]
#>
#> # urban = Y
#>
#> age_sc | Predicted | 95% CI
#> ---------------------------------
#> -1.50 | 0.24 | [0.18, 0.30]
#> -1.39 | 0.27 | [0.21, 0.34]
#>
#> Adjusted for:
#> * district = 0 (population-level)
Created on 2023-12-11 with reprex v2.0.2
Adjusted for: district = 1 in the case of the gam prediction...
This issue is still left. I'm not very familiar with gam's, so the footnote blindly prints the value that is used when creating the data grid, no matter if that value is actually used or not. Maybe I could check for s(..., bs = "re")
, and the adjust the message Adjusted for...
?
It's a bit of a special case. I don't honestly know how often people will want to include "old-style" random effects (sensu Jim Hodges, i.e. bs="re"
) in GAM(M)s, or more generally to exclude random effects/latent variables. In this case I was doing it to make a point for a class. (Without checking, I don't know how the "adjusted for" message is generated, or what ggpredict
does in general for latent variables/smooths that aren't included in terms
...)
It basically looks for all variables in the model that were not defined as focal terms, and sets these remaining variables to the value at which these are "held constant" (either selects mean or mode, based on the type of the variable):
model_frame <- insight::get_data(m_gam, source = "mf")
focal_terms <- c("age_sc", "urban")
model_predictors <- colnames(model_frame)
# keep those, which we did not process yet
model_predictors <- model_predictors[!(model_predictors %in% focal_terms)]
constant_values <- lapply(model_predictors, function(x) {
pred <- model_frame[[x]]
if (is.factor(pred)) {
pred <- droplevels(pred)
}
ggeffects:::.typical_value(
pred,
fun = "mean",
weights = NULL,
predictor = x,
log_terms = NULL
)
})
names(constant_values) <- model_predictors
constant_values
#> $use_n
#> [1] 0.3924509
#>
#> $district
#> [1] "1"
OK, I see by digging a bit that .typical_value()
uses the first level of factors as the typical value by default (unless fun = "mode"
is specified ...)
To be honest, I'm not sure if there's going to be an easy way to adjust the labeling without getting way into the weeds ... the bug-fix is good enough for me. Maybe (??) it's worth adding a very brief section to the "introduction_randomeffects" vignette to mention that things work slightly differently in the gam(... + s(..., bs = "re"))
case?
(Apologies in advance that this question is not more boiled down/minimal: I think it's mostly likely a documentation issue, but I'm having trouble figuring it out. Happy to close/repost elsewhere if there's a more appropriate venue.)
tl;dr for two equivalent models fitted respectively via (1)
lme4::glmer
with an intercept RE ((1|id)
) and (2)mgcv::gam
withs(id, bs = "re")
,ggpredict()
gives population-level predictions forglmer
and district-1-level predictions forgam
. I can see how to getpredict.gam()
to produce population-level predictions in this case (viaexclude
), but I'm having trouble passing those arguments throughggpredict()
.package load/model fits
comparisons of coefficients, raw predictions, etc.
The help page for
ggpredict
says that I can pass arguments through topredict()
:so I would have expected these to give equivalent answers:
... but I can see that the predictions are different, and furthermore that the
ggpredict
output prints "Adjusted for: district = 0 (population-level)" in the case of theglmer
prediction andAdjusted for: district = 1
in the case of thegam
prediction ...