Open bilderbuchi opened 1 year ago
A few questions:
A few ideas / answers:
Should we interpret an empty message as acknowledgment? At least as an "message received and also heartbeat" acknowledgment. That would be good for a "reply to every message, at least with a heartbeat" heartbeat pattern, see #4 .
Remember: zmq messages arrive either complete or not, so if we get the header frame, we know, that we have all frames. Therefore, there is no possibility to interpret a partially received message as an acknowledgment.
* How do we mark an answer of, for example Get? Previously I used Set as the answer to Get. Another option is to keep the name (and distinguish via conversation ID / reply reference), or to have distinctive commands like Get and Get_Reply.
We could use the DATA <name> <value>
message type, just over the control channel?
* do we want to just get / set just one value or list of values?
More than one would certainly be useful! At least for SET it would be a dict. We just have one message over the network, the Actor asks the Driver for a number of updates, and sends back the reply.
* do we want to accept args for get/set? For example for a forced update (with cached values). If we get/set just one variable, args are easy to implement, I'd say.
That might be useful, we'll have to find a good way to specify the format. This also depends on the API of Actor -- if we are dealing only with the one argument from_cache
(or force, or whatever) it's probably easier to have an additional command GET_FRESH
or somesuch.
* what do you intend with the log message? I think logging should use the data protocol, because a Component does not know, where to send the logs specifically and via the data protocol a logging facility could subscribe. I open a separate issue.
Yeah, logs should be emitted/published into some logging stream. Bonus points iiuc: if nobody's subscribed, no message is emitted). If that's on the data protocol or a separate one, I can't decide now.
A few ideas / answers:
* Regarding Call return value: We should always return anything, be it void/None...
Seems reasonable. An ACK
might be in order. We'll probably also need a NULL
value/indicator (maybe look in Avro, first).
* regarding data: the first frame (component ID) is used for filtering **and** is sent, therefore the payload does not need that information again.
Yeah, however we might also mutate that info in the header (add/remove Coordinator info), and it could be attractive to keep the payload alone meaningful on its own (i.e. without the message metadata) -- on their way through the system/code, at some point the metadata might be stripped off.
* regarding list of known components: we could do it with a get command. Similarly the status.
I'd keep this separated as this is for separate concerns. Why should we mix Parameter updates with housekeeping updates? What's the attraction of having one less message type/command, but "multiplexing/overloading" another?
* I like to keep names consistent, therefore I would start all command, which try to get something, with get (GET_STATUS, GET_COMPONENTS...) instead of using REQ or something else.
yeah, why not.
Should we interpret an empty message as acknowledgment? At least as an "message received and also heartbeat" acknowledgment.
I don't think so; we should have a distinct ACK
-type message. Otherwise, if you send a command, and receive an empty message -- was that the regular Component heartbeat? An acknowledgement of a command? Or do you prefer to resolve this from message metadata (reply-reference/conversation-id)?
Added START/STOP_POLLING
Here again some notable differences of the DataProtocol (PUB-SUB) in regard to the Control Protocol:
zmq.proxy()
with two sockets as parameters), besides, that I do not see the benefit.Therefore, we have to treat the protocols differently (in the control protocol, we separate header from payload, in the data protocol, it is one message
I'd keep this separated as this is for separate concerns. Why should we mix Parameter updates with housekeeping updates? What's the attraction of having one less message type/command, but "multiplexing/overloading" another?
I see the Components list and the Status as a property of the Coordinator/Component. If I want a property, I call GET.
Null/None
Avro has "null"
I don't think so; we should have a distinct ACK-type message. Otherwise, if you send a command, and receive an empty message -- was that the regular Component heartbeat? An acknowledgement of a command? Or do you prefer to resolve this from message metadata (reply-reference/conversation-id)?
The idea is, that the "message received" answer, is just a heartbeat (the other side know, you're still alive). A command or somesouch should be acknowledged by a specific message.
We could say, that the Component may send a "ping" (content may be "null", not empty) message to the Coordinator, which responds with its typical answer (empty frame). That makes it easy: If a message does not contain content, do not respond. If it contains a message, either respond with some answer to the question, or answer an empty message. This prevents an infinite heartbeat chain.
I see the Components list and the Status as a property of the Coordinator/Component. If I want a property, I call GET.
Well, but (so far), a GET
was for a "Parameter" (as in "a property (in the English, not the Pythonic sense) of the Driver represented by a Actor."), not for any property/quantity. What if a Driver implements a property called known_components
? Do we want to have to forbid all Parameter names that collide with our housekeeping fields?
These concerns are separate, and should stay separate. Right now I don't see an overriding advantage of munging these two together. Do you?
If you want to absolutely use GET
to get housekeeping properties, we can do GET_PARAM
for the driver's properties, but that's the same outcome as just using a specific command for the housekeeping stuff :shrug:
Sorry, I missed all the "Actors" and always thought about any Component.
A command or somesouch should be acknowledged by a specific message.
Why is that? Isn't the expected reply acknowledgement (enough) for a specific message?
The idea is, that the "message received" answer, is just a heartbeat (the other side know, you're still alive). A command or somesouch should be acknowledged by a specific message.
We could say, that the Component may send a "ping" (content may be "null", not empty) message to the Coordinator, which responds with its typical answer (empty frame). That makes it easy: If a message does not contain content, do not respond. If it contains a message, either respond with some answer to the question, or answer an empty message. This prevents an infinite heartbeat chain.
So, to understand what you wrote correctly (and maybe a little in "devil's advocate" style):
Do we really need a ratio "normal content": acks of 1:1? A "standard" command exchange causes 4 messages? Are the response times so bad that we want our Components to have an ACK before they get the proper reply half a second or second or so later?
Here again some notable differences of the DataProtocol (PUB-SUB) in regard to the Control Protocol:
* The Data Protocol Coordinators (let's call them Proxy), do not do anything with the messages, except passing on (equally they pass on subscription/unsubscription requests). * We could invent new Proxies, which hand on messages alternating their header, but I do not want to write that code, if there is a really good and reliable solution (call `zmq.proxy()` with two sockets as parameters), besides, that I do not see the benefit. * Messages are only sent, if someone subscribed (you understood correctly) * Only the first frame is meaningful as some sending topic, the rest is payload.
Therefore, we have to treat the protocols differently (in the control protocol, we separate header from payload, in the data protocol, it is one message
thanks for that, that's illuminating. I agree with the notion of not reinventing stuff! Could/should the Data Protocol Proxies live inside a Coordinator, or do you want to create a separate Component for that? The former seems reasonable from my point of view, fewer addresses/entities to keep track of, and the Coordinator could set up the details with the Component when establishing the control connection.
Do we really need a ratio "normal content": acks of 1:1? A "standard" command exchange causes 4 messages?
Actually, it's more than 4 when the coordinators are involved! If I computed correctly, 8 messages with 1 coordinator, and 12 messages with 2 Coordinators (inter-Node), just to send "Hey, C2.CompA, give me the temperature" -> "It's -5 degrees". :confused:
I guess we'll need the high water marks :sweat_smile:
That is the reason I went for the ping pong heartbeat: https://zguide.zeromq.org/docs/chapter4/#Heartbeating-for-Paranoid-Pirate
We could (to reduce data transfer) make these heartbeats without any frames (even without names!). Or we just send heartbeats, if explicitly requested. So an actor, which did not get any message in some time, contacts its Coordinator, asking, whether it is still alive.
Could/should the Data Protocol Proxies live inside a Coordinator, or do you want to create a separate Component for that? The former seems reasonable from my point of view, fewer addresses/entities to keep track of, and the Coordinator could set up the details with the Component when establishing the control connection.
I thought to keep them separate, as you have to connect the Proxies differently, than the Control Coordinators. Also, you need different addresses (at least ports) anyways.
I thought to keep them separate, as you have to connect the Proxies differently, than the Control Coordinators. Also, you need different addresses (at least ports) anyways.
Yeah, but that could be part of the CONNECT
reply/handshaking/setup, no? "Here's the connection details to attach your Data and Log connections to my ports".
One Component will need to multiple connections, anyway - do we want to centralise those in the Coordinator (so every Component has effectively 3 connections to a Coordinator), or have 3 separate "central" Components (that could make for some beautiful spiderwebs :D)?
As the Data protocol is inherently different (just one way), the connection between its Proxies has to be different from the Coordinator connection. Let's discuss it in a separate issue.
Whether these different parts end up in one piece of software or not, the protocols remain separate and the question does not slow down the protocol definition.
I propose to rename CONNECT / DISCONNECT to SIGNIN / SIGNOUT, in order to differentiate them better from the actual socket connection. That makes it easier in the documentation to differentiate between a connected and a signed in Component.
For example, as a requirement to Message handling, we can state, that the Components must be signed in (which requires to be connected as well). If we stated that they have to be connected, it could be misunderstood, that it is sufficient to do a socket connect.
I propose to rename CONNECT / DISCONNECT to SIGNIN / SIGNOUT, in order to differentiate them better from the actual socket connection. That makes it easier in the documentation to differentiate between a connected and a signed in Component.
For example, as a requirement to Message handling, we can state, that the Components must be signed in (which requires to be connected as well). If we stated that they have to be connected, it could be misunderstood, that it is sufficient to do a socket connect.
Good call!
According to the latest state of the PR #38 (and my latest comments therein, e.g. this comment), I think we should boil down the Control Messages to
SIGN-IN
SIGN-OUT
SEND_DIRECTORY
- a request for the Directory of a Coordinator, subject to change of the message type signature, possibly we would prefer sth like "GET" or "REQ" over "SEND"(please). What do you think? Or was this the CO_TELL_ALL
? CO_UPDATE
- a Coordinator sending its local and global DirectoryIn the comments to #38, relating to the discussion in #44, I proposed to simplify the messages so that Coordinators do not announce individual SIGN_IN/OUT
actions, but simply send their Directory to all connected Coordinators when it changes.
CO_UPDATE
SEND_DIRECTORY
request is received by a Coordinator, they reply with their Directories in the CO_UPDATE
message. CO_UPDATE
message to Co3 which contains updates within the N2 Namespace which Co3 does not have yet, Co3 should, rather than trusting Co1 blindly, at least ask Co2 again for its Directory. CO_UPDATE
message, it updates its Directories, global and local, and possibly starts to sign-in to new remote Coordinators, as discussedLet's avoid discussing the propagation on sign-in/-ou in the issue collecting message types -> #46.
Regarding message types (and their encoding), we can orient at COAP: https://en.wikipedia.org/wiki/Constrained_Application_Protocol#Request/response_code_(8_bits) or at http status codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
LOCK / UNLOCK a resource (or part of it), see #14
Let us start an informal list of "command verbs"/message types that we need within ECP. This way, we get a feeling about the set of commands we need and how they apply to different Components. I have here used the term "message type" to also refer to the various commands of the control protocol, as the boundaries can be diffuse, see e.g.
KNOWN_COMPS
. I'm open that we restructure that if needed.This should not be considered formally specified yet, but serve as a basis for what we later add to the specification or to find out how well Avro fits our needs. Let's put the command verb/message type in CAPS, and any arguments between
<angle brackets>
, and specify if a return message is required (disregarding transport-layer acknowledgements). This is not the specific syntax yet, so don't get hung up on separator choices, etc.Feel free to edit the comment as needed.
Housekeeping
REQ_STATUS
: Request current status of a Component. Reply:STATUS <status description>
ERROR <error level, error description>
: Message detailing an error, error categories TBD. Can also be a reply to a message.LOG <log level, log message>
: Log message including log level (levels TBD).Routing
SIGN_IN
: Announce the presence of the sending Component, and request registration with the targeted Coordinator. Reply: Outcome of the sign-in attempt.SIGN_OUT
: Request the deregistration of the sending Component from the targeted Coordinator. Reply: If accepted, confirm deregistration. This will be the last message from the Coordinator to that Component in that connection's lifetime.LIST_KNOWN_COMPS
: Instruct a Coordinator to send the list of Components it knows (both Node-local and distributed). Reply: see next.KNOWN_COMPS <Coordinatorname> <list of component IDs>
: This might also be sent out upon Component connection, so it's not strictly a "command" or a direct reply to a command.Control
ACKNOWLEDGE
: Acknowledge a received message or correct execution of a command.GET <parametername>
: Request the named Parameter's most recent value from an Actor. Reply: The value (and possibly the name).SET <parametername> <value>
: Set the named Parameter of an Actor to the passed value.CALL <actionname> [args]
: Call the named action of an Actor, using 0 or more arguments. TBD how to deal with (lack of) return values.LIST_PARAMS
/LIST_ACTIONS
might not be necessary if we use Avro as the "schema" of an entity is part of the connection handshake.START_POLLING <interval_ms> [1+ parameter names]
: Command an Actor to fetch (and publish) fresh values for these parameters at the given interval. Interval given first as we have variable nr of arguments afterwards.STOP_POLLING [1+ parameter names]
Stop polling these parameters.Data
DATA [Dict with 1+ parameter-value pairs]
Published Parameter values. TBC if message payload should include Component ID.