mlr-org / parallelMap

R package to interface some popular parallelization backends with a unified interface
https://parallelmap.mlr-org.com
Other
57 stars 14 forks source link

multicore - Parallelisation still doesn't work. #22

Closed mareichhoff closed 10 years ago

mareichhoff commented 10 years ago

Hello parallelMap-fans,

parallelStart(mode="multicore",....) still doesn't work under Linux. We had this problem before and you used a trick to change it. But now in the recent CRAN-version of parallelMap there is again this problem. One says: "multcore", but the jobs are executed sequentially. (In Windows "socket" works very fine!)

Thank you very much for your help!

Best regards, Markus

mllg commented 10 years ago

Does it work on your system if you call mclapply directly?

N.B.: mcmapply was fixed in R 3.1.1.

berndbischl commented 10 years ago

Also please post a complete, minimal example that demonstrates this.

mareichhoff commented 10 years ago

Yes, Michel, you are right. I still have 3.0.x installed on the server. The question is: Is it possible to update some package containing "mcmapply"? I tried to update the "multicore" package. The update worked but it didn't change anything.

@Bernd: A minimal example in this case is not necessary. I see for example that packges are only sourced on the master and not on the slaves by using "parallelSource". Messages like ".... is sourced on the slaves" are missing.

berndbischl commented 10 years ago

@MarkusMG

Please post an example! I need to see how you call the package, with how many jobs. And please post sessionInfo and number of cores on that machine. And the options you set in our R profile for parallelMap, if you do that.

mareichhoff commented 10 years ago

R> library(mlr) Loading required package: ParamHelpers Loading required package: BBmisc R> library(parallelMap)

R> parallelStart(mode="multicore", cpus=8) Starting parallelization in mode=multicore with cpus=8. Loading required package: parallel

R> a=5 R> parallelExport("a") Exporting objects to package env on master for mode: multicore

R> parallelSource("/home/markuse/R_scripts/LinienFit.R") Sourcing files on master: /home/markuse/R_scripts/LinienFit.R

##################################################################### Number of cores: maximum 8. If I repeat the same above with cpus=4... nothing changes.

Session Info:

R> sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices datasets utils methods base

other attached packages: [1] parallelMap_1.1 mlr_2.1 BBmisc_1.7 ParamHelpers_1.3

loaded via a namespace (and not attached): [1] checkmate_1.2 codetools_0.2-8 splines_3.0.2 survival_2.37-7

mareichhoff commented 10 years ago

I can't give an example for "parallelSource(....)" because you don't have the r-script then. But it doesn't matter. You see that the source command is only executed on the master and not on the slave.

If I execute this on a windows machine with 4 cores, then there always appears the message: "Sourcing file on slaves" or something like that.

mllg commented 10 years ago

Yes, Michel, you are right. I still have 3.0.x installed on the server. The question is: Is it possible to update some package containing "mcmapply"? I tried to update the "multicore" package. The update worked but it didn't change anything.

You need to use mclapply implemented in the package parallel, not multicore.

A minimal example in this case is not necessary. I see for example that packges are only sourced on the master and not on the slaves by using "parallelSource". Messages like ".... is sourced on the slaves" are missing.

Sourcing on the master is equivalent to sourcing on the slaves in multicore mode.

Please, report the output of the the following chunk:

library(parallel)
library(parallelMap)

foo = function(i) Sys.sleep(10)

system.time(mclapply(1:4, foo))

parallelStart(mode = "multicore")
system.time(parallelMap(foo, 1:4))
mareichhoff commented 10 years ago

markuse@achlys:~$ r R> library(parallel) R> library(parallelMap) R> R> foo = function(i) Sys.sleep(10) R> R> system.time(mclapply(1:4, foo)) user system elapsed 0.02 0.00 20.02 R> R> parallelStart(mode = "multicore") Autodetecting cpus: 8 Starting parallelization in mode=multicore with cpus=8. R> system.time(parallelMap(foo, 1:4)) Mapping in parallel: mode = multicore; cpus = 8; elements = 4. user system elapsed 0.01 0.00 10.02 R>

mareichhoff commented 10 years ago

It seems to work, isn't it? I didn't know that the message "sourcing ... on slaves" doesn't appear in multicore mode.

I don't finde the "parallel" package on cran. So I can't update it. But if it works it's not necessary.

mllg commented 10 years ago

Everything is alright.

mareichhoff commented 10 years ago

I forgot the sessionInfo above:

R> sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices datasets utils methods base

other attached packages: [1] parallelMap_1.1

loaded via a namespace (and not attached): [1] BBmisc_1.7 checkmate_1.2

mllg commented 10 years ago

I've improved the info message to avoid future confusion. Parallelization seems to work for you though.

Note that sometimes the linux task scheduler moves multiple threads to a single core, especially if the system is under heavy load.