stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 25 forks source link

Fix bootstrapping (`deltas = TRUE`) #474

Closed fweber144 closed 11 months ago

fweber144 commented 11 months ago

This fixes a bug occurring for bootstrapped quantities (RMSE and AUC: standard error and confidence interval) if deltas = TRUE: In that case, we need to call bootstrap() only once (thereby making use of the matrix ability of bootstrap()'s fun) instead of twice, to ensure that the bootstrap samples are the same for the submodel (or reference model) and the baseline model (which is usually the reference model). The old behavior led to the strange result that the reference model's SE could be non-zero in case of deltas = TRUE when a seed was not set via argument seed (which gets passed to bootstrap()), see below.

Illustration:

data("df_binom", package = "projpred")
dat <- data.frame(y = df_binom$y, df_binom$x)
rfit <- rstanarm::stan_glm(y ~ X1 + X2 + X3,
                           family = binomial(),
                           data = dat,
                           chains = 1,
                           iter = 500,
                           seed = 1140350788,
                           refresh = 0)

devtools::load_all()

vs <- varsel(rfit, method = "L1", nclusters_pred = 2, seed = 123)

set.seed(234)
summary(vs, deltas = TRUE, stats = "rmse")
summary(vs, deltas = TRUE, stats = "auc")

Previously, rmse.se was 0.02171921 and auc.se was 0.07552023. With this PR, rmse.se and auc.se are both 0, as expected.