bilderbuchi commented 1 year ago

My notes/draft on the message format, so far. For alignment/discussion, this is a quick first shot.

Message format

We have different message types:

Control messages
Data messages
Logging messages
TBC more?
[ ] F1: Do we want a plain-text (SCPI-like) protocol or a binary encoded one? E.g. do we have plaintext command verbs like GET or opcodes like 0x02 that are defined in a table somewhere?

Message structure

All messages have the same base structure:

Header
Payload
Checksum

Header

Timestamp
Message ID (possibly UUID v7 including timestamp) #16
1 Sender
0 or more recipients (pub-sub messages don't have a recipient afaict)
0 or 1 reply-reference
~~Probably payload length (for knowing when the checksum starts)~~ (zmq delivers the whole message in one)
Regarding the formatting of the header, see #33
[ ] H1: How does a coordinator get the message if the recipient is another Node? Do we also need to give the Coordinator's ID?
[ ] H2: Can we manage a common header for data, control and logging messages?

Control Payload

<CMD> [<args>]+

<CMD> is from a command dictionary

SET
GET
CALL
RESET
ERROR
LIST_PARAMS
LIST_ACTIONS
...
[ ] C1: Should an argument always be a key-value pair, or do we stay plain sequence of tokens? That is, is prop1 15.2 prop2 true 2 or 4 args` What about arguments without value?
[ ] C2: Is every token in the message a zeromq "frame"? Can we decide what goes in a frame?

Data Payload

TODO, didn't have time to define yet.

Logging Payload

TODO, didn't have time to define yet.

Checksum

TBD, some universally acceptable CRC check, I guess?

bilderbuchi commented 1 year ago

This is a high-level issue, I expect we will farm out details to separate issues later.

BenediktBurger commented 1 year ago

A few comments regarding zmq's working:

You can have as many frames as you like (however more frames creates more overhead) and you can put in a frame whatever you like (as a bytes object): The frames parameter is just a list of bytes objects (in python termini).
In the PUB-SUB communication can the first frame be used to filter messages (a so called "topic")
Zmq manages, that you get the whole message (it internally sends the length of the frame and checks for that).

Consequences for us:

The first frame of the data protocol has to be some kind of topic (I propose the sender name).
A checksum could be added to the header, as an individual frame (the last one) or the last x bytes of every frame for that particular frame.
We can (if we wish) put different "sentences" (meaningful units of a message, for example SET prop1 15.1) in different frames.

Some more ideas regarding content:

We can use json serialization with lists and dictionaries to convey key value pairs. In my implementation (growing with this project and going to be adjusted) I currently test a list of lists: Each entry of the outer list is a "sentence". A sentence is a list consisting in a command type and possibly arguments. An example: b"[['GET', ['property1', 'another property']], ['SET', {'property1': 7}], ['GET', ['property1']]. In this example I request the values of two properties, I set one of these properties to a specific value and I request the property again.
yaq uses Apache Avro RPC to serialize data. That could be another possibility.

Ideas regarding the command type:

it is difficult to put a byte value in a json content as a byte. Sending it human readable ("0x02") defeats the purpose of a smaller footprint. In that case, I'd prefer few-letter-abbreviations (enums are helpful) like "S", "G"...

Regarding logging:

we can use the same message format as the data protocol, just a different port (my favourite solution)
we could distinguish at the topic level (should work well)
we could distinguish in the content, but that way the logging facility gets messages with data (and has to discard them) and the observer gets log entries (and has to discard them). Therefore this version creates useless traffic and I dislike it.

bilderbuchi commented 1 year ago

yaq uses Apache Avro RPC to serialize data. That could be another possibility.

I read a bit on the subway today, it really looks like a good option; this would also take some of the handshaking, capability listing, reply tracking, message verification (with schema), RPC burden from us. zmq would probably mostly be the transport layer (so one frame per avro message; plus maybe the topic frame). JSON or binary option. Implementations for several languages. Specification here.

Yaq notes on why/how they use avro: https://yeps.yaq.fyi/107/

bilderbuchi commented 1 year ago

Here is another description of some message format (of a protocol a colleague recommended, but probably not that good a fit for our use case), could inform us regarding the message structure: https://en.wikipedia.org/wiki/Constrained_Application_Protocol#Message_formats

bklebel commented 1 year ago

To answer one question here:

H1: How does a coordinator get the message if the recipient is another Node? Do we also need to give the Coordinator's ID?

In zmq, at least two endpoints of communication need to be defined in a fixed way, i.e. you define a certain communication channel, similar to tcp/ip (on which it builds), which is an IP adress and a port, i.e. a socket. In that sense, for our case, all the non-Coordinator nodes talk to the Coordinator in a more or less hardcoded way, their messages always go to/through to Coordinator by default, thus we do not need to give a Coordinator's ID in this message. The matter changes (slightly) once we need/want to route messages through a number of Coordinators, then the Coordinators might need to start adding more "route-information" frames/data for properly getting back a reply through the chain, but this is currently not our concern, I think. Currently, every Actor needs to be initialized with the knowledge of the socket info for the Coordinator, while that can be at localhost or on a remote IP.

BenediktBurger commented 1 year ago

Here the "names" come into play:

Every Component talks with their coordinator (so, they do not need to know any address). Let's say Component A sends a message to B.
The Coordinators keep a list of names and corresponding addresses to which they send the message. If several names can have the same address, the coordinator does not need to know, that the recipient is another coordinator. So Coordinator 1 looks up the address for B (which happens to be Coordinator 2) and delivers it there, while storing the address of A in its list.
The recipient's coordinator finally delivers the message to the recipient (via its stored address). Coordinator 2 has a direct link to B and delivers the message. It also stores the address of Coordinator 1 in its recipients list under "A".
For the return path, B sends a message to A and each Coordinator on the way, having learnt the name-address combination of A, passes the message, until it arrives at A.

With that protocol, the header does not change and each hop just knows, how to reach the next hop.

bklebel commented 1 year ago

Well, maybe we want to make a separate issue out of that, since it pops up again and again xD I have opened #22 to that end, and would guess it would be better to focus here on the message format.

Regarding the format, I guess we should first distinguish how precise we want to control what happens ourselves. If we are to use the PUB-SUB (with XPUB and XSUB sockets on the proxy) for a certain channel, the makeup of the frames which we control ourselves will be different than if we do it purely with ROUTER and DEALER sockets. The PUB-SUB sockets cannot be connected to ROUTER/DEALER sockets (without errors or undocumented/unpredictable results). So we have two options:

Do everything with ROUTER-DEALER, even the data publishing (with quite a lot of overhead for us to implement)
Do the publishing of data and logging messages with PUB-SUB and the command/control/GET/SET messages with ROUTERs and DEALERs. I strongly suggest the second choice. There, we have a fundamental difference between the headers necessary (or appropriate) for PUB-SUB and the ROUTER-DEALER channels. For example, as @bmoneke pointed out, for the data channel (PUB-SUB), the first frame is a "topic", and the simplest way to do that in a big system is to send with it a name/ID of the respective node. Currently I am not sure whether zmq will allow us to put more arbitrary frames in this type of socket channel.

BenediktBurger commented 1 year ago

Currently I am not sure whether zmq will allow us to put more arbitrary frames in this type of socket channel.

zmq only considers the first frame for "topic filtering". The rest of the message is just a simple message consisting in 0 or more frames.

bklebel commented 1 year ago

zmq only considers the first frame for "topic filtering". The rest of the message is just a simple message consisting in 0 or more frames

Ok, then we can actually make a common Header for both data and control protocols:

name of the sending component (used in the proxied PUB-SUB data protocol as filter)
message ID (vouching for uuid7 with timestamp)
0 or more recipients
0 or 1 reply-reference (i.e. conversation ID)
checksum of the payload (if we actually want it - zmq drops corrupt messages by itself, as far as I read today in the zmqguide, I think)

This will keep a bit of overhead in the data-channel (recipients and conversation ID are unnecessary here), but we can keep it the same across the different channels. The same is true for the logging channel (if we now separate it with its own PUB-SUB proxy), recipients and reply-reference are unnecessary. In general, having a message ID could be beneficial for debugging purposes, I think - if we can include the timestamp into an uuid directly (if it can be extracted sensibly) this would make things easy too. Alternatively, we can also just insert a timestamp frame after the message ID for good measure.

bilderbuchi commented 1 year ago

What I'm not clear on is how the Avro messages fit into the whole thing (e.g. this) -- are the Avro messages completely contained in the zmq payload? Where/how do we disntiguish what goes in the zmq header, and what into the Avro message? Do we duplicate information, or is the distinction clear anyway (e.g. zmq header: only message/routing metadata).

BenediktBurger commented 1 year ago

My idea is, that we have a routing header (I prefer the first frame), which contains the addresses and message ID etc., and then the payload in all the other frames. As default payload we define some protocol, for example the apache avro protocol.

So yes, we have frame 0 for Zmq routing stuff and frames 1 to n for the apache avrò payload.

pymeasure / leco-protocol

Message format #20

Message format

Message structure

Header

Control Payload

Data Payload

Logging Payload

Checksum