Closed wlandau closed 1 year ago
Notes to self:
crew_port_set()
. (Also create crew_port_get()
.)Re https://github.com/wlandau/crew/issues/31#issuecomment-1464345858,
First use the 'bus' protocol as that is the lightest:
I am actually thinking of using these custom sockets to also send common data for #33. Would I use the push/pull protocol for that? Would the client use a send()
and the server use a blocking recv()
?
Re #31 (comment),
First use the 'bus' protocol as that is the lightest:
I am actually thinking of using these custom sockets to also send common data for #33. Would I use the push/pull protocol for that? Would the client use a
send()
and the server use a blockingrecv()
?
You wouldn't use a push/pull unless you had a very specific need to ensure flow is only one way. The fewer semantics the better (unless you need guaranteed delivery in which case use req/rep). If it is one-to-one I would just use 'bus', or 'pair' if you want to be sure it is one-to-one (this won't allow 2 processes dialing into one side for example).
I'm not sure how exactly you plan to implement #33 but a typical pattern that works well is for one party to request an asynchronous receive recv_aio()
. Then the other party can then send a message at whatever time and the 'recvAio' will automatically resolve when the message is received.
On this point - I have not been following what this automatic updating is about. But simply sending the .Globalenv
or its contents should be trivial using the current mirai
interface right, if you wanted to do so.
True, but would it be fast? I am thinking of a persistent workers scenario where all tasks operate on a shared common large in-memory object that is slow to serialize. I would rather send that large object once per persistent worker (infrequently) rather than once per task (much more frequently).
If e.g. daemons(n= 8, common_data= list(...)) were to send the common data for servers to pick up, then I could definitely rely on mirai for this.
True, but would it be fast? I am thinking of a persistent workers scenario where all tasks operate on a shared common large in-memory object that is slow to serialize. I would rather send that large object once per persistent worker (infrequently) rather than once per task (much more frequently).
How large is large to give me some idea?
In the general case for a targets pipeline, it could be a large fraction of the allowable R session memory, e.g. 2-4 GB. Small enough to all be in the global env, but slow to serialize and send over the local network.
In the general case for a targets pipeline, it could be a large fraction of the allowable R session memory, e.g. 2-4 GB. Small enough to all be in the global env, but slow to serialize and send over the local network.
Right, I would estimate that would take a couple of seconds, a few seconds perhaps. Not an obstacle I'd say, in the absence of a better alternative.
If you send this common object, is it meant to be immutable then, and if so how do you ensure that? Maybe you know what the right answer is already, but to me there are many potential pitfalls!
Yes, it is meant to be immutable. Although immutability is hard to strictly enforce here, targets
(and clustermq
) have lightweight protections that work well enough for all but the most extreme cases.
In targets
, the user assigns shared globals to the tar_option_get("envir")
environment, which is usually .GlobalEnv
. These globals are passed to the export
field of clustermq
and treated as common data (sent only once per worker). On top of that, each target has its own set of objects in its own temporary target-specific non-global environment which inherits from .GlobalEnv
. This temporary target-specific non-global environment is the one supplied to eval()
, so except for extreme anti-patterns like attach()
, the target has access to global objects while minimizing the risk of accidentally modifying them.
mirai
aims to be as fast and efficient as possible, so common data seems like it would be a good fit for your package, both inside and outside of crew
.
It sounds like you have found a way that works which is good. I put a heavy premium on correctness so it would take a lot to convince me any of this is worthwhile. I would willingly spend the extra 5s per task.
Fair enough, I will go ahead with my original plan to implement common data in crew using bus sockets.
Fair enough, I will go ahead with my original plan to implement common data in crew using bus sockets.
Is your plan to put this common data in into the global environment of all the servers?
Pretty much. I am writing a wrapper that will recv() the data, assign it to the global env, then call mirai::server().
Looking back at your comments, maybe I do need guaranteed delivery for this (is it exactly what it sounds like?), which puts me with rep/req. Should I use rep/listen for the client and req/dial for the servers?
Closing this thread because it digressed to #33, and before that, @shikokuchuo solved it with https://github.com/wlandau/crew/issues/31#issuecomment-1464345858.
https://github.com/shikokuchuo/mirai/issues/33#issuecomment-1455133613