Closed wlandau closed 6 years ago
Because of #289, I actually just started a package called rsched
. Unfortunately, since it's a new repo, the actual code needs to stay closed-source until there is at least minimal proof-of-concept functionality, but I will open-source it ASAP. For now, we can write a design spec somewhere in the drake
repo.
I plan to create a bookdown
document in a separate drake
branch for the design specification of a drake
scheduler. Will post updates on this thread.
As discussed in #283, we should search the literature for good scheduling designs and algorithms. For the new package (maybe named rsched
), I have a stub of a design spec at https://github.com/ropensci/drake/tree/scheduler. We should plan ahead.
I have a private repo for a package called crew
(Coordinated R Ensembles of Workers), and I am really excited to share this preliminary work. A proof of concept for persistent workers is fully fleshed out, but not actually working yet. I will open-source it ASAP. The main bottleneck I see is to fix the existing functionality. The master process (launched with callr::r_bg()
) currently hangs instead of posting jobs for the workers, and I am struggling to fix it.
Edit: changing the name to workers
since it is actually available. Fixed the issues with callr
and deadlock. Will open-source it as soon as I get permission.
crew()
seems to be available on CRAN, I think it's a lovely name:
available::available("crew")
#> ── crew ─────────────────────────────────────────────────────────────────────────────────────────────────────────
#> Name valid: ✔
#> Available on CRAN: ✔
#> Available on Bioconductor: ✔
#> Available on GitHub: ✔
#> Bad Words: ✔
#> Abbreviations: http://www.abbreviations.com/crew
#> Wikipedia: https://en.wikipedia.org/wiki/crew
#> Wiktionary: https://en.wiktionary.org/wiki/crew
#> Urban Dictionary:
#> the sport of gods, requires constant physical exertion, perfect poise, balance, timing, awareness, brute force, and a sensitive touch.
#> Tags: gang rowing group posse crews friends coxswain homies clique krew
#> http://crew.urbanup.com/896326
#> Sentiment:???
Created on 2018-03-04 by the reprex package (v0.2.0).
Thanks, Kirill! But on second thought, I think the name "workers"
is better.
FYI: the workers
package is now out in the open: https://github.com/wlandau/workers. The current code is just a proof of concept. We should write a full design spec before any more serious work on the implementation.
FYI: I just drafted an initial design spec for workers
. I think it will help us either (1) develop the package, or (2) figure out if we should abandon it in favor of a solution already in progress. @krlmlr, you mentioned that @gaborcsardi and @lionel- might already be working on the problem. Maybe @HenrikBengtsson also has plans, I do not know.
I think I see a way to move forward with the workers
package: the custom message queue in #408. I plan to externalize this minimalist queue as a separate package and then build on top of it. Whether we offload drake
functionality to these packages depends on how well they mature.
drake
has a ridiculous amount of code, but it also depends on a ridiculous number of packages. Decisions about offloading could shift the scales, but drake
will still be an enormous package either way. My opinions about the relevant tradeoffs are not as strong as they once were.
I'm having new doubts about this one because of the special precautions drake
needs to take in order to account for the latency of sending newly-built targets over a network. Whenever a remote persistent worker finishes a target, it sends a checksum to the master so the master can wait for the right data to arrive. This is only practical because drake
hashes all the targets already. For a general-purpose dependency-aware job scheduler for R, imposing this hashing for its own sake may be unreliable and cause too much of a delay. Plus, as I learn about how thoroughly solved this problem already is in tools like dask
, I am starting to think the next step is to try writing an R front-end to an established tool (which could in turn become another drake HPC backend). Ref: #417.
Some users have requested the option to have
drake
act like an ordinary job scheduler without worrying about reproducibility. And a separate package would be a great place to apply the knapsack problem and group jobs into workers.EDIT : 2018-03-03
I have started a private repo for a package called
crew
(Coordinated R Ensembles of Workers). See the comments later in the thread for details. I am really excited to work on this.