Closed zauster closed 2 years ago
Thanks a lot for the report.
I can't look into this right away, but will do so as soon as I can.
Note to self: Bengtsson presentation about profiling future parallel code https://www.jottr.org/2022/06/23/future-user2022-slides/
There seems to be a lot of “send data to nodes” overhead, which I think I was able to limit a bit. But beyond that, I still don’t get it.
As an experiment, I created a new branch which accepts a mc.cores
argument and parallelizes using parallel::mclapply()
. It is much faster than single core.
remotes::install_github('vincentarelbundock/modelsummary@parallel')
Relevant code here: https://github.com/vincentarelbundock/modelsummary/blob/parallel/R/modelsummary.R
When I use future
, things are very slow the first time around, but they are fast the second time I call the function.
Any ideas what could explain this?
library(tictoc)
library(insight)
library(modelsummary)
library(future.apply)
mod <- list(
download_model("brms_mixed_2"),
download_model("brms_mixed_3"),
download_model("brms_mixed_5"))
mod[[4]] <- mod[[5]] <- mod[[6]] <- mod[[7]] <- mod[[1]]
# sequential is slow
tic()
tab <- modelsummary(
mod,
output = "data.frame",
statistic = "conf.int")
toc()
#> 53.589 sec elapsed
# {parallel} is fast
tic()
tab <- modelsummary(
mod,
output = "data.frame",
statistic = "conf.int",
mc.cores = 7)
toc()
#> 17.353 sec elapsed
# {future}: 1st time is slow
plan(multisession)
tic()
tab <- modelsummary(
mod,
output = "data.frame",
statistic = "conf.int")
toc()
#> 45.663 sec elapsed
# {future}: 2nd time is fast
tic()
tab <- modelsummary(
mod,
output = "data.frame",
statistic = "conf.int")
toc()
#> 18.158 sec elapsed
Alright, I just merged the parallel branch in "main". You can now add mc.cores=7
(or whatever number of cores), and modelsummary
will invoke parallel::mcapply
under the hood. On my linux machine, this scales not-quite-linearly. It should work on mac, and probably sometimes on Windows too.
future
support is still there, but there may be some just-in-time compilation going on that slows down the first call.
I tried your example on my recent laptop and it went from roughly 6 to 4 to 2 seconds. Let me know if you see the same thing with the current version from Github.
Closing issues to get a better sense of what TODOs are left on my plate, but feel free to re-open or keep commenting if this doesn't work for you.
plan(multicore)
uses the same forked parallelization framework as mclapply()
. What do you get if you use:
plan(multicore, workers=7)
in your comparison?
PS. Forked processing is not available on MS Windows, so such users will end up running in sequential mode.
It works!
The first time I call my future-parallel function with "multicore" is as fast as parallel
, and as fast as the the 2nd time I run the function.
I'm still kind of curious why the 1st and 2nd run had such different timings with "multisession"...
FWIW, I'm running Ubuntu inside a WSL2-Windows.
Thanks for looking!
I guess (but I could be wrong) that maybe there is some transferring of data to the "worker" R sessions the first time, which needs not be fine the second time?
Also, since/if there is no easy clear solution to this, why not simply remove all duplicates from the modellist, tell the user about it and continue? Do you know any real world examples where one would want to include one model twice (in one table)?
Do you know any real world examples where one would want to include one model twice (in one table)?
Showing the same model with different types of sandwich standard errors would be one example.
But my sense is this "problem" is not actually related to duplicated models. It is just slow on first run regardless of what models the models
list contains.
It's most likely because the workers need to load some packages during the first call. Since 'multisession' uses persistent PSOCK workers, the packages are already loaded is succeeding calls. If you preload some the heavy package up-front, e.g.
void <- future_lapply(1:nbrOfWorkers(), FUN = function(x) {
loadNamespace("brms")
loadNamespace("rstan")
})
you'll see a smaller different between the first and the second call.
BTW, you're missing to declare use of RNG in your parallelization. That is, specify future.seed = TRUE
to get rid of the RNG warnings;
1: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_lapply-1’) unexpectedly generated
random numbers without declaring so. There is a risk that those random numbers are not statistically
sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures
that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable
this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore".
Thanks a lot for these insights. Things are much clearer now for me, and I have update the code and documentation to reflect those insights. I have also set future.seed = TRUE
as suggested.
Thanks!
Hey,
if I include one model twice in the list of models to be summarised and try to parallelise (using plan(multisession)) the computation, it takes several minutes until the output is printed.
REPREX:
sessionInfo: