mlr-org / parallelMap

R package to interface some popular parallelization backends with a unified interface
https://parallelmap.mlr-org.com
Other
57 stars 14 forks source link

parallelMap causes crash when switching between different levels of resampling #71

Closed annette987 closed 5 years ago

annette987 commented 5 years ago

I am using mlr to run a benchmark of several different learners and want to implement parallelization. If I set the level in parallelStart() to either mlr.resample or mlr.tuneParams, then pass to benchmark() a learner with only one level of resampling, followed by a learner using nested resampling, such as via makeTuneWrapper(), then the program silently crashes after starting to benchmark the second learner.

If I pass to benchmark() the learner using nested resampling first, followed by the learner without nesting, all is fine. If I pass it just the learner using nested resampling on its own, or just the other learner on its own, all is fine.

Here is a minimal example that causes a crash:

library(survival)
library(mlr)
library(parallelMap)

data(veteran)
set.seed(24601)
mas.task <- makeSurvTask(id = "TEST", data = veteran, target = c("time", "status"))
mas.task <- createDummyFeatures(mas.task)

inner = makeResampleDesc("CV", iters=2, stratify=TRUE)  # Tuning
outer = makeResampleDesc("CV", iters=2, stratify=TRUE)  # Benchmarking

cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
rfsrc.lrn <- makeLearner(cl="surv.randomForestSRC", id = "rfsrc", predict.type="response")
rfsrc_params <- makeParamSet(
  makeIntegerParam("mtry", lower = 3, 5),
  makeDiscreteParam("nodesize", values = c(5, 10, 15, 20, 25)),
  makeDiscreteParam("nodedepth", values = c(5, 10, 15)),
  makeDiscreteParam("ntree", values=c(2000))
)
rfsrc_ctrl <- makeTuneControlRandom(maxit=2L)
rfsrc.tune.lrn = makeTuneWrapper(rfsrc.lrn, resampling = inner, par.set = rfsrc_params, control = rfsrc_ctrl, show.info = FALSE)

parallelStart(mode="multicore", cpus=12, level="mlr.resampe", show.info = TRUE, logging=TRUE)
learners = list( cox.lrn, rfsrc.tune.lrn )  
bmr = benchmark(learners=learners, tasks=mas.task, resamplings=outer, measures=list(cindex), show.info = TRUE)
parallelStop()

and here is the output:

Loading required package: ParamHelpers
Starting parallelization in mode=multicore with cpus=12.
Deleting 2 log dirs in storage dir.
Task: TEST, Learner: coxph
Resampling: cross-validation
Measures:             cindex    
[Resample] iter 1:    0.7173516 
[Resample] iter 2:    0.6825470 

Aggregated Result: cindex.test.mean=0.6999493

Task: TEST, Learner: rfsrc.tuned
Resampling: cross-validation
Measures:             cindex    
Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 12; elements = 2.

Simply swapping the order in which the learners are passed to benchmark(), the program completes and I get the following output:

Loading required package: ParamHelpers
Starting parallelization in mode=multicore with cpus=12.
Deleting 2 log dirs in storage dir.
Task: TEST, Learner: rfsrc.tuned
Resampling: cross-validation
Measures:             cindex    
Mapping in parallel: mode = multicore; level = mlr.resample; cpus = 12; elements = 2.

Aggregated Result: cindex.test.mean=0.7072699

Task: TEST, Learner: coxph
Resampling: cross-validation
Measures:             cindex    
Mapping in parallel: mode = multicore; level = mlr.resample; cpus = 12; elements = 2.

Aggregated Result: cindex.test.mean=0.6999493

Stopped parallelization. All cleaned up.

EDIT: I just tried passing two tuned learners to benchmark and it runs the first one but crashes on the second. So perhaps it is not about switching levels of resampling after all.

berndbischl commented 5 years ago

1) please really check submitted MRE before directly before posting. the first code block is not syntactically valid

parallelStart(mode="multicore", cpus=12, * level="mlr.resampe" *****

so I am guessing you at least edited your code before posting without running

2) the process does not seem to crash but to (infinitely?) block?

3) I could reproduce this exactly once on my machine. and this was before I updated all of my packages from CRAN. I cannot help, if this is not reproducible

I can keep this open for a few days, but see above.

@annette987 does the error really still occur of you update all packages from cran?

@mllg @pat-s can you please check on your side?

berndbischl commented 5 years ago

@annette987 also as documented by randomForestSRC they use internal parallelization with OpenMP. that might results in "hickups" if parallelization with pm and rfSRC are both "on"

a) use another simple (but tuned) learner in your example. does the problem still occur? b) turn off parallelization in rfSCR (you shpuld do this anyway, if you want to run a benchmark as above)

annette987 commented 5 years ago

Thank you very much for all your suggestions. After updating all packages from CRAN the problem is no longer occurring. Sorry I did not think to do this before contacting you.