stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
389 stars 133 forks source link

Parallelization of `kfold()` (across CV folds) on Windows #551

Open fweber144 opened 3 years ago

fweber144 commented 3 years ago

Summary:

On Windows, the parallelization of kfold() across the CV folds doesn't always work (throws an error).

Description:

The issue mentioned in "Summary" occurs if the stan_<...>() call uses objects in some arguments (see "Reproducible Steps" below). Perhaps the reason is that there are no exports after the following lines?: https://github.com/stan-dev/rstanarm/blob/45e0707b0ab8c8e16d65e7f2af70a98bfaa93363/R/loo-kfold.R#L226-L227

Reproducible Steps:

data("df_gaussian", package = "projpred")
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
D <- sum(grepl("^X", names(dat_gauss)))
p0 <- 5
N <- nrow(dat_gauss)
( tau0 <- p0 / (D - p0) * 1 / sqrt(N) )
library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE)) # gives 4 on my machine
rfit <- stan_glm(y ~ .,
                 data = dat_gauss,
                 prior = hs(global_scale = tau0),
                 QR = TRUE,
                 seed = 1669262042)

Now inner parallelization via kfold.stanreg()'s internal object stan_cores works (doesn't throw an error) and gives the (correct) messages Fitting model 1 out of 4, ..., Fitting model 4 out of 4:

rkfold_one_kfold_core <- kfold(rfit, K = 4, cores = 1)

However, outer parallelization across CV folds via kfold.stanreg()'s internal object kfold_cores doesn't work:

rkfold <- kfold(rfit, K = 4)

That last line gives the (correct) message Fitting K = 4 models distributed over 4 cores, but throws the error

Error in checkForRemoteErrors(val) :
  4 nodes produced errors; first error: object 'tau0' not found

RStanARM Version:

2.21.2 (from https://mc-stan.org/r-packages/)

R Version:

4.1.1

Operating System:

Windows 10 x64

jgabry commented 3 days ago

Following up on this based on https://discourse.mc-stan.org/t/projpred-error/37314/3. Sorry I never saw this previously. As a non-Windows user I'm not quite sure how to fix this. It seems like the issue here is that tau0 would need to be included in clusterExport(), but tau0 is just in the global environment. How would we know which variables in the global environment we need to export (assuming we wouldn't want to export everything since there could be large irrelevant objects in the global environment)?