wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
129 stars 4 forks source link

A dispatcher process for {crew} workers #107

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

mirai has an active dispatcher process for tasks. I am thinking I could implement a different one for crew workers in order to free up control in the parent process. @shikokuchuo, this may be easier than even the light-touch fused dispatcher we talked about before.

Worker dispatch in crew requires knowledge of how many workers are active (using mirai::status()) and the number of unresolved tasks. I wonder, is it possible to call mirai::status() from a different process than the one that called mirai::daemons()? And what would be the best way to count unresolved tasks from a crew worker dispatcher process?

shikokuchuo commented 1 year ago

Sorry I've not had the bandwidth to respond to your previous questions yet. I will get to these.

shikokuchuo commented 1 year ago

Once I have confirmation that shikokuchuo/mirai#69 is fixed, I'll put something together that will show you how easy it is (hopefully). I do want to prioritise fixing any remaining bugs before the possibility of introducing new ones with further development.

shikokuchuo commented 1 year ago

Using another dispatcher process is fine, but syncing the variables does present a problem. Previously when we were just using 'bus' sockets, that may have been more plausible as they are multicast. However, we have seen that only req/rep will do.

Instead it will be much easier if it works to run your functions on dispatcher itself, to be evaluated directly in that context. If you need another socket to send information back to host then you can still prototype that using the current limited interface, and we can add that as a setup step for efficiency later if needed.

Using the same process also solves your other issue #108 about how to sync with dispatcher events.

I have pushed a 'dispatcher' branch to mirai which implements a simple interface for dispatcher extensions.

Included is the option of supplying either a function or call. I suspect a function will be easier, and avoids the additional frame through an eval() call. Unless something complicated is required, in which case additional arguments to eval will probably also need to be exposed.

I have tested very briefly by using cat() to write dispatcher variables to a file (as stdout is redirected), and it seems to work as expected.

However let me know if structurally something is missing.

Also just a note that this feature is definitely for after the next CRAN release of mirai, so you have plenty of time to explore.

wlandau commented 1 year ago

That's fantastic, @shikokuchuo! I can't wait to try this! As soon as I emerge from this latest development sprint on cloud storage with targets.

wlandau commented 1 year ago

Given https://github.com/shikokuchuo/mirai/discussions/78#discussioncomment-7164289, I am not so sure the extended R dispatcher would work for crew because of the strange kinds of launch failures that crew needs to worry about. I am not sure crew could get away from a polling approach to workers (but fortunately the polling is much gentler than it would have to be for tasks).

The obvious alternatives range from challenging to infeasible:

  1. If I were to create a separate dispatcher process for just the launcher, it would need saisei() and nextget("cv"), and I am not sure if mirai can support two copies of the same host at the same time. (In the past while testing moot edge cases, I have seen NNG-level errors that forking in NNG is unsafe.)
  2. 105 seems messy and brittle, with a lot of extra work to manage communication to the remote controller. It would also add another network hop for data, which would be inefficient.

Fortunately though, nextget("cv") will make polling-based crew a lot nicer: in particular, it will remove the need for throttling.

wlandau commented 1 year ago

If I really need to, I will come back to https://github.com/wlandau/crew/issues/105 for this. But it is a huge undertaking to implement, especially compared to how simple crew currently is, and I am not sure we strictly need it.