mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

run asynchronously #23

Open slowkow opened 7 years ago

slowkow commented 7 years ago

Thanks for this package! It's very easy to use.

I'd like to ask if it's possible to run a job asynchronously, without waiting for the results.

For example, when I run:

job <- Q(fx, x=1:3, n_jobs=1)

I get:

Submitting 1 worker jobs for 3 function calls (ID: 6642) ...
  |======================================================================| 100%
Running calculations (1 calls/chunk) ...
  |                                                                      |   0%
  |======================================================================| 100%
Master: [19.0s 0.0% CPU]; Worker average: [11.7% CPU]

While this message is being printed, I'm unable to continue executing commands. I have to wait for the submitted job to complete before I can continue working.

Is it possible to let R wait in the background, so I can continue working?

mschubert commented 7 years ago

Thank you! Always good to hear that my utility code is useful :+1:

I'm currently not planning to add this to the package because I think asynchronous computation is a separate problem that is outside of the scope of functionality.

However, you could easily do something like:

fx = function(n) {
    Sys.sleep(n)
    n * 2
}
p = parallel::mcparallel(clustermq::Q(fx, n=1:5, n_jobs=1))
# do your other work ...
parallel::mccollect(p)[[1]]

You may additionally want to suppress the progress bar.

wlandau commented 6 years ago

Despite the discussions in #86 and https://github.com/HenrikBengtsson/future/issues/204, I am still interested in an asynchronous Q(). Yes, asynchronicity is a separate problem, and I understand the need to set clear boundaries for the package's scope. But speaking generally, I think the need for asynchronicity arises frequently enough that the major alternatives to clustermq support it:

I am also curious about what it would take. Do we need different socket types? How much would we accomplish if we

  1. Set dont.wait to TRUE in worker(), and
  2. Expose a non-blocking collector in the QSys class?
wlandau commented 6 years ago

And while a potential clustermq backend for future may give us asynchronicity, I really like the API you have designed natively, both for Q() and the R6 wrapper around reusable workers.

wlandau commented 6 years ago

I think this is fixed on the develop branch via #86. (But maybe it needs documentation.) Example: https://github.com/ropensci/drake/blob/master/R/clustermq.R#L32-L48.

mschubert commented 6 years ago

No, these are unrelated: here is to run Q in the background, there to interface with workers directly

mschubert commented 5 years ago

Note: an option would be to create a promise object that will wait for results only explicitly if it is accessed; this could even be result[1:5] waiting only for the first 5