The linear predictors used internally by log_lik() may differ from those derived from posterior_linpred().
Description:
While working on projpred's unit tests, I discovered an inconsistency between the linear predictors that are used internally by rstanarm:::log_lik.stanreg() and those calculated by rstanarm:::posterior_linpred.stanreg(). This might be due to some numerical instability within one of these two approaches. I stumbled across this while trying to reproduce the output of log_lik() manually (based on the linear predictors calculated by posterior_linpred()); that's why the reproducible steps shown below include the log likelihood values.
The point is that line all.equal(eta_from_linpred, eta_from_loglik_internals) shows differing linear predictors. The fact that the log likelihood values differ (line all.equal(ll_from_linpred, unname(ll))) is then clear because it's already the linear predictors which differ.
library(rstanarm)
rfit <- readRDS("rfit.rds")
# Via log_lik():
ll <- log_lik(rfit)
# Inside of rstanarm:::log_lik.stanreg(), the following takes place:
args <- rstanarm:::ll_args.stanreg(rfit, newdata = NULL, offset = NULL,
reloo_or_kfold = FALSE)
fun <- function(data_i, draws) {
val <- dbinom(data_i$y, size = data_i$trials,
prob = rstanarm:::.mu(data_i, draws), log = TRUE)
rstanarm:::.weighted(val, data_i$weights)
}
out <- vapply(seq_len(args$N), FUN = function(i) {
as.vector(fun(data_i = args$data[i, , drop = FALSE],
draws = args$draws))
}, FUN.VALUE = numeric(length = args$S))
ll_from_loglik_internals <- out
stopifnot(identical(ll_from_loglik_internals, unname(ll)))
# Inside of rstanarm:::.mu(), the following takes place:
eta_from_loglik_internals <- do.call(cbind, lapply(
seq_len(args$N),
function(i) {
data_i <- args$data[i, , drop = FALSE]
rstanarm:::linear_predictor.matrix(
args$draws$beta,
rstanarm:::.xdata(data_i),
data_i$offset
)
}
))
names(dimnames(eta_from_loglik_internals)) <- c("iterations", "")
# Via posterior_linpred():
eta_from_linpred <- posterior_linpred(
rfit, newdata = NULL, offset = rep(0, nrow(rfit$data))
)
all.equal(eta_from_linpred, eta_from_loglik_internals)
## --> [1] "Mean relative difference: 0.2869042"
mu_from_linpred <- binomial()$linkinv(eta_from_linpred)
ll_from_linpred <- t(array(dbinom(rfit$data$y_gamm_brnll, size = 1,
prob = t(mu_from_linpred), log = TRUE),
dim = rev(dim(eta_from_linpred))))
all.equal(ll_from_linpred, unname(ll))
## --> [1] "Mean relative difference: 0.2186712"
Summary:
The linear predictors used internally by
log_lik()
may differ from those derived fromposterior_linpred()
.Description:
While working on projpred's unit tests, I discovered an inconsistency between the linear predictors that are used internally by
rstanarm:::log_lik.stanreg()
and those calculated byrstanarm:::posterior_linpred.stanreg()
. This might be due to some numerical instability within one of these two approaches. I stumbled across this while trying to reproduce the output oflog_lik()
manually (based on the linear predictors calculated byposterior_linpred()
); that's why the reproducible steps shown below include the log likelihood values.Reproducible Steps:
File
rfit.rds
is provided in rfit.zip.The point is that line
all.equal(eta_from_linpred, eta_from_loglik_internals)
shows differing linear predictors. The fact that the log likelihood values differ (lineall.equal(ll_from_linpred, unname(ll))
) is then clear because it's already the linear predictors which differ.RStanARM Version:
2.26.1
R Version:
R version 4.3.1 (2023-06-16)
Operating System:
Ubuntu 22.04.3 LTS