privefl / bigparallelr

Parallel tools in R
3 stars 2 forks source link

register_parallel() has some side effect #6

Open privefl opened 3 years ago

privefl commented 3 years ago

From @sritchie73

library(bigparallelr)

# Register a parallel backend in my script for 4 cores
cl <- makeCluster(4)
registerDoParallel(cl)

# Run parallel foreach loop:
res <- foreach(ii = 1:10) %dopar% { ii } # works fine, as expected

# Some function that uses bigparallelr::register_parallel to do some parallel work:
f <- function() {
  bigparallelr::register_parallel(4)
  foreach(ii = 1:10) %dopar% { ii }
}
res2 <- f() # works fine, as expected

# Now if we try to run a parallel foreach loop again, we get an error:
res3 <- foreach(ii = 1:10) %dopar% { ii }
# Error in summary.connection(connection) : invalid connection

# But we can fix this by re-registering a parallel backend:
registerDoParallel(cl)
res3 <- foreach(ii = 1:10) %dopar% { ii }

stopCluster(cl)
privefl commented 3 years ago

But it seems I could use {future} instead

f <- function() {
  # bigparallelr::register_parallel(4)
  save <- future::plan("multisession", workers = 4)
  print(save)
  doFuture::registerDoFuture()
  foreach(ii = 1:10) %dopar% { ii }
  future::plan(save)
}
mcanouil commented 2 years ago

Related to #3 instead of having to change (too much) the internal code, a simple note to show the users how to use {future} should be enough.

library("doFuture")
registerDoFuture()
plan(multisession)

In addition, all register* calls should be removed and only defined at top level by user, not internally, see "For package developers". This means changes in all bigparallelr reverse dependencies.

privefl commented 2 years ago

Yes, I have thought about doing that, but I am still not sure it is really worth it.

mcanouil commented 2 years ago

It gives more control to the user while completely externalising parallel configuration which leads to a smaller package.

7

privefl commented 2 years ago

One advantage of using {future} would be that it can use different types of parallelization (e.g. on many nodes on a HPC cluster); however I have found this takes a lot of time to retrieve the results, therefore I am not too enthusiastic about it anymore. Moreover, this can always be performed manually by the user using the easy-to-use {future.apply} package.

I would need strong arguments to proceed with these changes.

I don't think having a smaller package is a very strong one, since I need only one line of code to register the parallel backend (cf. e.g. https://github.com/privefl/bigparallelr/blob/master/R/split-parapply.R#L58).

mcanouil commented 2 years ago

however I have found this takes a lot of time to retrieve the results, therefore I am not too enthusiastic about it anymore.

It mostly depends on the platform and the backend used.
I rather set myself how parallel computing is done than only setting a number of cores, especially for the reason above where dependencies, R object, etc. might get copied instead of shared.

I don't think having a smaller package is a very strong one

It is not even an argument ^^ Not setting "register" internally allows for more scalability on any platform (or almost) and allows the use of future. Here, the ability to use future is only a by-product of the changes. Also, letting the user setting up the parallel fix the issue described here (#6)

To note: the register functions of any do* packages were not intended to be set internally, instead by the user. https://github.com/privefl/bigparallelr/blob/714ed12517346362d2aeda1bf4f59596c14e27bb/R/register-parallel.R#L26-L44

privefl commented 2 years ago

Yes, I know that is was not intended to be set internally, but this is so much more convenient to my packages' users to set ncores = nb_cores() instead of having to register (and unregister) their own parallel backends.

And it makes it consistent with other functions relying on other types of parallelism (e.g. OpenMP).

mcanouil commented 2 years ago

This is a tradeoff^^

To note, future has a beta option for OpenMP: https://future.futureverse.org/reference/future.options.html#options-for-configuring-low-level-system-behaviors-1