Open khughitt opened 4 years ago
I also see a difference but my times aren't nearly so bad (at least on my setup, see below). I would stay away form benchmarking with r-devel unless it is very close to release.
Perhaps @HenrikBengtsson might have an idea.
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
#> Loading required package: globals
#> Loading required package: future
#>
#> Attaching package: 'future'
#> The following object is masked from 'package:caret':
#>
#> cluster
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
#> Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
#> in future (>= 1.13.0) when running R from RStudio, because it is
#> considered unstable. Because of this, plan("multicore") will fall
#> back to plan("sequential"), and plan("multiprocess") will fall back to
#> plan("multisession") - not plan("multicore") as in the past. For more details,
#> how to control forked processing or not, and how to silence this warning in
#> future R sessions, see ?future::supportsMulticore
system.time({
set.seed(252)
mod <- train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 1,
nthread = 1,
trControl = trainControl(search = "random")
)
})
#> user system elapsed
#> 0.739 0.050 27.367
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> user system elapsed
#> 0.905 0.040 2.196
Created on 2020-01-01 by the reprex package (v0.3.0)
and
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
#> Loading required package: globals
#> Loading required package: future
#>
#> Attaching package: 'future'
#> The following object is masked from 'package:caret':
#>
#> cluster
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
#> Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
#> in future (>= 1.13.0) when running R from RStudio, because it is
#> considered unstable. Because of this, plan("multicore") will fall
#> back to plan("sequential"), and plan("multiprocess") will fall back to
#> plan("multisession") - not plan("multicore") as in the past. For more details,
#> how to control forked processing or not, and how to silence this warning in
#> future R sessions, see ?future::supportsMulticore
# system.time({
# set.seed(252)
# mod <- train(
# Class ~ .,
# data = test_dat,
# method = "xgbTree",
# tuneLength = 1,
# nthread = 1,
# trControl = trainControl(search = "random")
# )
# })
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> user system elapsed
#> 0.782 0.047 27.792
Created on 2020-01-01 by the reprex package (v0.3.0)
Here they are when run in the terminal instead of the RStudio IDE (where I see no difference)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
system.time({
set.seed(252)
mod <- train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 1,
nthread = 1,
trControl = trainControl(search = "random")
)
})
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
> set.seed(1)
>
> # create fake data
> test_dat <- twoClassSim(20)
>
> library("doFuture")
Loading required package: globals
Loading required package: future
Attaching package: ‘future’
The following object is masked from ‘package:caret’:
cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> registerDoFuture()
> plan(multiprocess, workers = parallel::detectCores() - 1)
>
> system.time({
+ set.seed(252)
+ mod <- train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 1,
+ nthread = 1,
+ trControl = trainControl(search = "random")
+ )
+ })
user system elapsed
3.595 1.820 1.202
>
> train_control <- trainControl(search = 'random',
+ classProbs = TRUE,
+ summaryFunction = twoClassSummary)
>
> system.time({
+ set.seed(252)
+ mod <-
+ train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 10,
+ nthreads = 1,
+ trControl = train_control,
+ metric = 'ROC'
+ )
+ })
user system elapsed
17.026 5.149 2.237
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
and
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
# system.time({
# set.seed(252)
# mod <- train(
# Class ~ .,
# data = test_dat,
# method = "xgbTree",
# tuneLength = 1,
# nthread = 1,
# trControl = trainControl(search = "random")
# )
# })
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
> set.seed(1)
>
> # create fake data
> test_dat <- twoClassSim(20)
>
> library("doFuture")
Loading required package: globals
Loading required package: future
Attaching package: ‘future’
The following object is masked from ‘package:caret’:
cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> registerDoFuture()
> plan(multiprocess, workers = parallel::detectCores() - 1)
>
> # system.time({
> # set.seed(252)
> # mod <- train(
> # Class ~ .,
> # data = test_dat,
> # method = "xgbTree",
> # tuneLength = 1,
> # nthread = 1,
> # trControl = trainControl(search = "random")
> # )
> # })
>
> train_control <- trainControl(search = 'random',
+ classProbs = TRUE,
+ summaryFunction = twoClassSummary)
>
> system.time({
+ set.seed(252)
+ mod <-
+ train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 10,
+ nthreads = 1,
+ trControl = train_control,
+ metric = 'ROC'
+ )
+ })
user system elapsed
17.392 5.128 2.261
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Hi, there could be multiple problems going on here. First, because 'xgboost' uses multi-threading internally, I suspect it is not safe to use forked parallel processing. Second, it looks like there are some non-exportable object/globals needed in the parallel processing that corrupts the R state.
For troubleshooting, it's useful to understand which foreach backends end up using forked processing (aka "multicore" processing). All of the following end up use the same forked processes framework of the 'parallel' package, which is also what parallel::mclapply()
uses:
doFuture::registerDoFuture(); future::plan("multicore", workers = 2L)
doParallel::registerDoParallel(cores = 2L)
doMC::registerDoMC(cores = 2L)
Comment: Note that plan(multiprocess)
equals plan(multicore)
on setups where multicore processing is supported (Linux and macOS but not MS Windows and not in the RStudio Console), otherwise it equals plan(multisession)
where it uses PSOCK clusters. So, for troubleshooting it always better to use explicit plan(multicore)
or plan(multisession)
.
For simplicity, let's use:
doFuture::registerDoFuture()
future::plan("multicore", workers = 2L)
Using this, I can reproduce the original problem - calling train()
takes forever on R 3.6.1 on Linux (Ubuntu 18.04.3). It almost appears to stall completely, but using strace Rscript caret_issue_1106.R
reveals that it is not completely stalled but instead it appears to stall for some time, then move very slightly forward, the stall again, and so on. I haven't waited for it to complete but, yes, I can imagine that will take hours and hours to finish.
So, forked processing and multi-threading should be avoided in R, a first attempt is to disable (OpenMPI) multi-threading used by xgboost. This can be done by setting:
OMP_NUM_THREADS=1
before R starts, e.g. by setting it in an .Renviron
file. An alternative way to disable multi-threading is to call:
RhpcBLASctl::omp_set_num_threads(threads = 1L)
in R before calling train()
.
The above solves the problem; the script no longer stalls.
At this point, I thought the combination of forked processing and multi-threading was the sole reason. However, it looks like there's more to it because it also stalls when using PSOCK clusters;
doFuture::registerDoFuture()
future::plan("multisession", workers = 2L)
Since the workers in a PSOCK cluster, contrary to forked workers, are R processes independent of the main R session, I would not expect multi-threading to be an issue. In other words, there's something else going on too. BTW, note that all of the following creates PSOCK clusters:
doFuture::registerDoFuture(); future::plan("multisession", workers = 2L)
doFuture::registerDoFuture(); cl <- parallel::makeCluster(2L); future::plan("cluster", workers = cl)
cl <- parallel::makeCluster(2L); doParallel::registerDoParallel(cl = cl)
What surprises me is that disabling multi-threading (as above) solves the problem.
Thus, there is something that requires xgboost to run in single-threaded mode when used in parallel code. This could be specific to caret or could apply for all parallel setups - I don't know.
Finally, by setting:
options(future.globals.onReference = "error")
before calling train()
we tell the future framework to look for external references/pointers among the global objects exported to the parallel workers and produce an error if one is detected. Sure enough, doing so when using doFuture::doRegisterFuture()
will report:
Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression
which suggests that a non-exportable object is used by the workers and this could cause a race condition. The traceback for the above error is:
1: source("caret_issue_1106.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: caret_issue_1106.R#81: system.time({
mod <- train(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1,
6: caret_issue_1106.R#82: train(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1, trControl = train_control
7: train.formula(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1, trControl = train_control, metric = "ROC
8: train(x, y, weights = w, ...)
9: train.default(x, y, weights = w, ...)
10: nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, method = models, ppOpts = preProcess, ctrl = trControl, lev = cla
11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "caret") %:% foreach(parm
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: getGlobalsAndPackages_doFuture(expr, envir = envir, export = obj$export, noexport = c(obj$noexport, argnames), packages = obj$package
14: getGlobalsAndPackages(expr, envir = globals_envir, globals = TRUE)
15: system.time({
assert_no_references(globals, action = action)
}, gcFirst = FALSE)
16: assert_no_references(globals, action = action)
If one run with options(error = utils::recover)
and signals SIGINT
(Ctrl-C) when train()
stalls, we'll see that "multicore" seems to be stuck at:
11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "car
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed, mc.silent = silent, mc.cores = co
14: selectChildren(ac[!fin], -1)
and if we use a PSOCK cluster, it is stuck at:
11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "car
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: clusterApplyLB(cl, argsList, evalWrapper)
14: dynamicClusterApply(cl, fun, length(x), argfun)
15: recvOneResult(cl)
16: recvOneData(cl)
17: recvOneData.SOCKcluster(cl)
18: socketSelect(socklist)
Using strace
show that both of these are stalled with kernel system call:
select(8, [6 7], [], NULL, {tv_sec=60, tv_usec=0}
I'm sure someone can dig into all the code and show there's some low-level multi-threaded race conditions going on here.
FWIW, it's been on my todo list for quite a while to automagically protect against using multi-threading and forked processing at the same time. A few days ago I decided to go ahead an implement this in the future framework. Most likely it'll automatically set RhpcBLASctl::omp_set_num_threads(1L)
for "multicore" futures, cf. https://github.com/HenrikBengtsson/future/issues/355. But again, that would only solve half of the problem going on here.
Thanks @HenrikBengtsson ; this is extremely helpful and I would have never figured this out.
I suspect that this would also apply to ranger
(as well as a bunch of others). I'm surprised that this has not come up previously.
You're welcome - I'm happy to help. Yeah, I think this happens for other packages too, but I don't know enough to tell exactly when multi-threading works and doesn't work with forked processing. It's on my way-too-long wishlist to read up on this and dive into the details - so this would be less of a "guessing game" (at least for me). I'm sure there are folks in the R community who have a better sense of what's going on and would be able to put together a list of do's and don't's.
Overview
When using caret to train an XGBoost model with parallelization at the resample level, execution continues indefinitely (or at least, for more than several hours in my tests), when the first
summaryFunc
used is of typetwoClassSummary
.For the same code, if a call is first made to
train()
on any arbitrary data, however, the executions proceeds as expected, and in the case of the example below, finishes in seconds.This effect does not appear to depend on the parallelization back-end used; tested with:
It also occurs if the
twoClassSummary
is used along-side of other summary functions as part of a yardstickmetric_set()
.Based on this behavior, I'm guessing there is some kind of initialization of the interface with the parallelization back-end that normally occurs during the first invocation to
train()
which is not taking place as-expected, in the case below.Minimal, reproducible example:
To see the behavior in action, first run the code below as-is, and (at least in the linux environment I'm testing it in), it will likely continue running for a long time.
Next, uncomment the commented first call to
train()
and re-run the script. It will execute in parallel and finish quickly as expected.Session Info:
Tested for two different versions of R/caret the same Linux machine:
R 3.6.1
6.0-84
.R nightly (2019-12-10)