stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
385 stars 132 forks source link

Parallelization of `kfold()` (across CV folds) on Windows #551

Open fweber144 opened 2 years ago

fweber144 commented 2 years ago

Summary:

On Windows, the parallelization of kfold() across the CV folds doesn't always work (throws an error).

Description:

The issue mentioned in "Summary" occurs if the stan_<...>() call uses objects in some arguments (see "Reproducible Steps" below). Perhaps the reason is that there are no exports after the following lines?: https://github.com/stan-dev/rstanarm/blob/45e0707b0ab8c8e16d65e7f2af70a98bfaa93363/R/loo-kfold.R#L226-L227

Reproducible Steps:

data("df_gaussian", package = "projpred")
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
D <- sum(grepl("^X", names(dat_gauss)))
p0 <- 5
N <- nrow(dat_gauss)
( tau0 <- p0 / (D - p0) * 1 / sqrt(N) )
library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE)) # gives 4 on my machine
rfit <- stan_glm(y ~ .,
                 data = dat_gauss,
                 prior = hs(global_scale = tau0),
                 QR = TRUE,
                 seed = 1669262042)

Now inner parallelization via kfold.stanreg()'s internal object stan_cores works (doesn't throw an error) and gives the (correct) messages Fitting model 1 out of 4, ..., Fitting model 4 out of 4:

rkfold_one_kfold_core <- kfold(rfit, K = 4, cores = 1)

However, outer parallelization across CV folds via kfold.stanreg()'s internal object kfold_cores doesn't work:

rkfold <- kfold(rfit, K = 4)

That last line gives the (correct) message Fitting K = 4 models distributed over 4 cores, but throws the error

Error in checkForRemoteErrors(val) :
  4 nodes produced errors; first error: object 'tau0' not found

RStanARM Version:

2.21.2 (from https://mc-stan.org/r-packages/)

R Version:

4.1.1

Operating System:

Windows 10 x64