Closed mb706 closed 4 years ago
A current workaround is to load a learner from a savefile. E.g. if a learner is loaded from the .RData
file at start, resampling with multicore works.
> library("parallelMap")
> parallelStartMulticore(2)
Starting parallelization in mode=multicore with cpus=2.
> library("mlr")
Loading required package: ParamHelpers
> lrn = makeLearner("classif.IBk")
> resample(lrn, pid.task, cv5)
Mapping in parallel: mode = multicore; cpus = 2; elements = 5.
[Resample] cross-validation iter 1: [Resample] cross-validation iter 2: ^C^C^C^C^C
> q("yes")
$ R
R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
[...]
> library("mlr")
Loading required package: ParamHelpers
> library("parallelMap")
> parallelStartMulticore(2)
Starting parallelization in mode=multicore with cpus=2.
> resample(lrn, pid.task, cv5)
Mapping in parallel: mode = multicore; cpus = 2; elements = 5.
[Resample] cross-validation iter 1: [Resample] cross-validation iter 2: mmce.test.mean=0.266
mmce.test.mean=0.318
# no hang
1) We did have that issue before. But not with the insights you presented here. It is also more a parallelmap issue right?
2) so the problem is that we load RWeka on the master, on learner construction, that is what makes the bug appear?
trainLearner
function to prevent hanging.I ran my own rJava based custom learner. It works find single thread, however with parallelStartSocket() I got some time out of session like this :
Exporting objects to slaves for mode socket: .mlr.slave.options Mapping in parallel: mode = socket; cpus = 20; elements = 1. Error in stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character)) : Errors occurred in 1 slave jobs, displaying at most 10 of them:
Is this caused by same restriction on mcapply (parallelMap) compatibility with JVM as you stated here ?
parallelStartSocket
is not based on and should not call mclapply
, so I am pretty sure it is not because of this issue.
(Note that parallelMap
in "socket" mode behaves slightly different from "multicore" mode in that the worker jobs are executed in a (kind of) vanilla environment with sockets; you might have to call parallelExport
and parallelLibrary
with "socket" when you wouldn't need to with "multicore".)
Hi I confirmed the time out is caused by something different with this issue though the single thread didn't take such duration. However parallelStartSocket is good alternative for parallelStartMulticore. What is a drawback of Socket compared to Multicore ? Only overhead , and necessity of export libraries ?
Multicore uses the operating system's fork()
to create child processes that have copy-on-write access to the parent process's memory. If you're working with a big dataset this means you can potentially have many processes operating on this data while only using up memory for the dataset once. (I think sometimes R's garbage collection messes this up and more memory gets used than needed, but usually it works). When you're using sockets, every individual worker process needs to separately load the data, so you have the overhead of (1) serialising the data from the main process and sending it to the worker processes and (2) keeping the data in memory for each process separately.
(I don't know parallelStartSocket that well however, so don't take my word for it.)
Thanks for such general question. I understood parallelStartSocket has significant overhead compared to parallelStartMultiCore. In my case, 40 core CPU cannot be available without multi thread/process, and MultiCore option cannot be used for my Java based code (because of the original issue in this thread). Socket solution seems to be alternative in case such incompatibility / scalability problem and only option for Windows. By the way I hope multi-level parallel (ex. Benchmark * Resample) will be supported.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is because
fork()
, which multicore is ultimately based on, and the java VM don't play along well if java is started before the forking happens. Loading java based packages, e.g. "RWeka", seems to start the java VM, so if the package gets loaded outside of theparallelMap
call. it fails.If, on the other hand, the fork is before loading the java vm, it works fine:
I therefore suggest to have a
configureMlr
option to defer loading of packages until a learner'strain
orpredict
function gets called. The user would still need to be careful not to load "RWeka" when he wants to use multicore, but this at least would give him the option. When a learner gets constructed, instead of loading a learner's package, mlr should simply check whether the requested package exists.