wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
123 stars 4 forks source link

Check for running workers using the sockets instead of the worker-specific API #31

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

https://github.com/shikokuchuo/mirai/issues/33#issuecomment-1455133613

wlandau commented 1 year ago

Notes to self:

wlandau commented 1 year ago

Re https://github.com/wlandau/crew/issues/31#issuecomment-1464345858,

First use the 'bus' protocol as that is the lightest:

I am actually thinking of using these custom sockets to also send common data for #33. Would I use the push/pull protocol for that? Would the client use a send() and the server use a blocking recv()?

shikokuchuo commented 1 year ago

Re #31 (comment),

First use the 'bus' protocol as that is the lightest:

I am actually thinking of using these custom sockets to also send common data for #33. Would I use the push/pull protocol for that? Would the client use a send() and the server use a blocking recv()?

You wouldn't use a push/pull unless you had a very specific need to ensure flow is only one way. The fewer semantics the better (unless you need guaranteed delivery in which case use req/rep). If it is one-to-one I would just use 'bus', or 'pair' if you want to be sure it is one-to-one (this won't allow 2 processes dialing into one side for example).

I'm not sure how exactly you plan to implement #33 but a typical pattern that works well is for one party to request an asynchronous receive recv_aio(). Then the other party can then send a message at whatever time and the 'recvAio' will automatically resolve when the message is received.

shikokuchuo commented 1 year ago

On this point - I have not been following what this automatic updating is about. But simply sending the .Globalenv or its contents should be trivial using the current mirai interface right, if you wanted to do so.

wlandau commented 1 year ago

True, but would it be fast? I am thinking of a persistent workers scenario where all tasks operate on a shared common large in-memory object that is slow to serialize. I would rather send that large object once per persistent worker (infrequently) rather than once per task (much more frequently).

wlandau commented 1 year ago

If e.g. daemons(n= 8, common_data= list(...)) were to send the common data for servers to pick up, then I could definitely rely on mirai for this.

shikokuchuo commented 1 year ago

True, but would it be fast? I am thinking of a persistent workers scenario where all tasks operate on a shared common large in-memory object that is slow to serialize. I would rather send that large object once per persistent worker (infrequently) rather than once per task (much more frequently).

How large is large to give me some idea?

wlandau commented 1 year ago

In the general case for a targets pipeline, it could be a large fraction of the allowable R session memory, e.g. 2-4 GB. Small enough to all be in the global env, but slow to serialize and send over the local network.

shikokuchuo commented 1 year ago

In the general case for a targets pipeline, it could be a large fraction of the allowable R session memory, e.g. 2-4 GB. Small enough to all be in the global env, but slow to serialize and send over the local network.

Right, I would estimate that would take a couple of seconds, a few seconds perhaps. Not an obstacle I'd say, in the absence of a better alternative.

shikokuchuo commented 1 year ago

If you send this common object, is it meant to be immutable then, and if so how do you ensure that? Maybe you know what the right answer is already, but to me there are many potential pitfalls!

wlandau commented 1 year ago

Yes, it is meant to be immutable. Although immutability is hard to strictly enforce here, targets (and clustermq) have lightweight protections that work well enough for all but the most extreme cases.

In targets, the user assigns shared globals to the tar_option_get("envir") environment, which is usually .GlobalEnv. These globals are passed to the export field of clustermq and treated as common data (sent only once per worker). On top of that, each target has its own set of objects in its own temporary target-specific non-global environment which inherits from .GlobalEnv. This temporary target-specific non-global environment is the one supplied to eval(), so except for extreme anti-patterns like attach(), the target has access to global objects while minimizing the risk of accidentally modifying them.

mirai aims to be as fast and efficient as possible, so common data seems like it would be a good fit for your package, both inside and outside of crew.

shikokuchuo commented 1 year ago

It sounds like you have found a way that works which is good. I put a heavy premium on correctness so it would take a lot to convince me any of this is worthwhile. I would willingly spend the extra 5s per task.

wlandau commented 1 year ago

Fair enough, I will go ahead with my original plan to implement common data in crew using bus sockets.

shikokuchuo commented 1 year ago

Fair enough, I will go ahead with my original plan to implement common data in crew using bus sockets.

Is your plan to put this common data in into the global environment of all the servers?

wlandau commented 1 year ago

Pretty much. I am writing a wrapper that will recv() the data, assign it to the global env, then call mirai::server().

Looking back at your comments, maybe I do need guaranteed delivery for this (is it exactly what it sounds like?), which puts me with rep/req. Should I use rep/listen for the client and req/dial for the servers?

wlandau commented 1 year ago

Closing this thread because it digressed to #33, and before that, @shikokuchuo solved it with https://github.com/wlandau/crew/issues/31#issuecomment-1464345858.