Control protocol content header

BenediktBurger commented 1 year ago

In #33 we decided upon one frame for header information. In this issue, we can discuss the content of that header frame.

bilderbuchi commented 1 year ago

AFAIK, the current proposal is to include

message type
conversation ID
message ID

I think we should be able to use fixed-length byte sequences for all of those, which would allow us to decode this header by length/position. I suggest to include the items in the above order, as that is presumable in decreasing order of importance.

Question 1: If we include a conversation ID, do we also need the message ID, or would it be sufficient to track request/response/response/... patterns/sequence by conversation ID alone?

Question 2: What should we use as conversation/message ID? uuid was proposed, but might be wastefully long? We can consult our goals to get OOM estimates of the (unique) message counts we need to keep track of.

Question 3: Do we want to include a timestamp? If yes, maybe folded into the message ID (e.g. UUID7 or UUID5), in which case we would include the message ID?

BenediktBurger commented 1 year ago

Question 1: If we include a conversation ID, do we also need the message ID, or would it be sufficient to track request/response/response/... patterns/sequence by conversation ID alone?

If we have a conversation ID, we do not need a message ID for tracking responses etc.

However, we loose the option to identify every message (and the timestamps in the message IDs).

Conversation ID

2 Bytes should be sufficient, as it is only relevant for the requesting party, which should not open more tan 65000 conversations at once.

bilderbuchi commented 1 year ago

2 Bytes should be sufficient, as it is only relevant for the requesting party, which should not open more tan 65000 conversations at once.

Don't we want conversation IDs to be unique for ~a whole session? That would make it ~trivial to filter e.g. all messages belonging to one conversation out of a log file/database.

edit: If I computed correctly using https://kevingal.com/apps/collision.html, 16 bits of ID, at 100 concurrent conversations (1 per device, order of magnitude estimate, e.g. 33 new conversation per second at 3 second average conversation duration), gives us a 7.3% probability of a collision! Waay to high imo.

BenediktBurger commented 1 year ago

There are two ideas:

The conversation_id of a response is message_id of the original message
The conversation_id of a response is the conversation_id of the original message (so it is chosen by the sender)

COAP follows the second idea:

Token

Every request carries a token (but it may be zero length) whose value was generated by the client. The server must echo every token value without any modification back to the client in the corresponding response. It is intended for use as a client-local identifier to match requests and responses, especially for concurrent requests.

Matching requests and responses is not done with the message ID because a response may be sent in a different message than the acknowledgement (which uses the message ID for matching). For example, this could be done to prevent retransmissions if obtaining the result takes some time. Such a detached response is called "separate response". In contrast, transmitting the response directly in the acknowledgement is called "piggybacked response" which is expected to be preferred for efficiency reasons.

(https://en.wikipedia.org/wiki/Constrained_Application_Protocol#Token)

BenediktBurger commented 1 year ago

edit: If I computed correctly using https://kevingal.com/apps/collision.html, 16 bits of ID, at 100 concurrent conversations (1 per device, order of magnitude estimate, e.g. 33 new conversation per second at 3 second average conversation duration), gives us a 7.3% probability of a collision! Waay to high imo

I thought, that a Component (let's say a Director) has an internal conversation counter, so it can have 65000 (256*256) conversations until it starts from zero again. So only if a conversation lasts longer, than you start 65000 other ones, then you have a collision.

BenediktBurger commented 1 year ago

Here is the COAP protocol definition: https://www.rfc-editor.org/rfc/rfc7252

I read their concept:

Messages have a message_id and a token
A server must respond to a request with the same token
There are confirmable and non-confirmable messages.
Confirmable messages require either an Acknowledgement or a Reset. In either case, this response has the same message_id as the request has. The request is sent, until either response arrives.
An answer to a non-confirmable message has a new message_id.
If it takes some time to respond to a confirmable message, the server sends an ACK (same message_id, no token) and later the response with the content (new message_id, original token)
Message_ids are used for duplicate detection and responses to confirmable messages.

BenediktBurger commented 1 year ago

AFAIK, the current proposal is to include

I proposed to include the content formatting type (json, avro, binary), not the message type. However, we can think about the message_type in the header.

BenediktBurger commented 1 year ago

edit: If I computed correctly using https://kevingal.com/apps/collision.html, 16 bits of ID, at 100 concurrent conversations (1 per device, order of magnitude estimate, e.g. 33 new conversation per second at 3 second average conversation duration), gives us a 7.3% probability of a collision! Waay to high imo.

Oh, I see one difficulty with the conversation_id: If both endpoints send each other a message with the same conversation_id (as both chose the same ID due to some circumstance), they will interpret the others message as a response, not a new request. Did you mean that?

We could mitigate that, if the message type differ (request type is different from response type).

bilderbuchi commented 1 year ago

Here is the COAP protocol definition: https://www.rfc-editor.org/rfc/rfc7252

I read their concept:

* Messages have a message_id and a token

* A server must respond to a request with the same token

* There are confirmable and non-confirmable messages.

* Confirmable messages require either an Acknowledgement or a Reset. In either case, this response **has the same message_id** as the request has. The request is sent, until either response arrives.

* An answer to a non-confirmable message has a new message_id.

* If it takes some time to respond to a confirmable message, the server sends an ACK (same message_id, **no token**) and later the response with the content (new message_id, original token)

* Message_ids are used for duplicate detection and responses to confirmable messages.

These points look useful/applicable in our situation. Token is basically our conversation id (in role).

bilderbuchi commented 1 year ago

I proposed to include the content formatting type (json, avro, binary), not the message type. However, we can think about the message_type in the header.

With "message type" I mean the command verbs (#29). We need to have those somewhere, I thought the content header would be the logical place.

Serialisation information will also be important, in case we use more than one scheme. Just for completeness, avro can use either json or binary within, but I assume you mean self-made schemes here.

bilderbuchi commented 1 year ago

Oh, I see one difficulty with the conversation_id: If both endpoints send each other a message with the same conversation_id (as both chose the same ID due to some circumstance), they will interpret the others message as a response, not a new request. Did you mean that?

Basically, yes. We can't use sequential codes because two Components might be on the same "offset" concurrently. If we use random ones, we have a certain risk of collision, i.e. two nodes accidentally choosing the same ID via RNG. By adjusting the length/complexity of the ID scheme against the expected message ID generation frequency, we can tune the collision risk to acceptable levels.

I'm not sure message type helps us with this, because message type is not a "random" variable.

BenediktBurger commented 1 year ago

I'm not sure message type helps us with this, because message type is not a "random" variable.

It helps: If I receive a "response type message", I know, it is the response to my request with the same conversation_id. "If I receive a "request type message", I know it is a new request and I have to return a response with the same conversation_id. For example Acknowledge and error are response message types. GET/SET/CALL are request message types.

bilderbuchi commented 1 year ago

Is your argument that by having two kinds of messages, this improves the odds of a collision by a factor of 2?

BenediktBurger commented 1 year ago

I'd say there are no collisions anymore:

Yes, several messages with the same conversation Id might arrive, but only those of "response type" are a response to my request. As the original sender sets the conversion ID, it can determine, what it gets back (just using "free" id's).

All the other messages with the same if have to be requests, and therefore this I'd is not relevant for the Components under scrutiny.

More clear:

we look at some Component CA
it sends every request with a unique (in CA) conversation id
all answers to the requests will have this conversation Id and their message type is of "response"
requests arriving at CA will be recognizable as requests by message type and CA will not try to match such a message to a request it sent itself.
CA answers requests withe the conversation Id, it received in the request

So, we have a combination of two filters: message type has to be "response" and conversation Id has to match. Only then, we got the response to our request.

bilderbuchi commented 1 year ago

So if Co1 sees/routes two "response type" messages from CA->CB and from CE->CF, which could have the same conversation id because they originate from different Components, what happens then?

I'm also thinking of e.g. logging streams, it would be practical/simple if the invariant "one conversation id <==> one conversation" would always be fulfilled (without further logic/analysis).

BenediktBurger commented 1 year ago

I see the conversion ID as a help for the end points of a conversation (especially the requesting one).

The Coordinators routing do not care, whether the routed message is of one or another conversation.

For logging purposes: you need the combination of recipient/sender and conversation Id to get the "full conversation Id".

I see it difficult to achieve collision less conversation Id without a central authority.

I see, that the stream logging is very important to you (I did not think about it).

Oh: logging has another difficulty: you have to combine the logs of all Coordinators to get a full message log.

bilderbuchi commented 1 year ago

I see, that the stream logging is very important to you (I did not think about it).

I'm thinking of the poor folks that have to troubleshoot future messaging/routing problems. :-D

I see it difficult to achieve collision less conversation Id without a central authority.

Yes, just like you can't achieve collisionless git hashes. What you can do is lower the probability of a collision until you're comfortable (number TBD, but much less than 7% :-p)

BenediktBurger commented 1 year ago

In my test system I use the conversation_id more and more, but not the message_id.

We could include the timestamp in the conversation_id and the message_id consists of the conversation_id and a temporal offset to the begin of the conversation. So:

The conversation_id is a timestamp plus a few random bytes (see the discussion about unique ids): unique.
The message_id is not additional to the conversation_id, but contains it and a temporal offset (a few bytes)
Example structure for complete ID part of the header (contains both IDs): "Timestamp", "Random bytes", "Message offset".

Advantages:

The temporal offset does not require a large byte count, such that the combination of both IDs has a low byte count (instead of two independently unique IDs).
Timestamp of every message can be calculated.
Easy ordering of messages in a conversation
You can identify the beginning of a conversation (temporal offset 0) easily from the ID itself (not from the message type)

Disadvantage:

The message timestamp has to be calculated.

bilderbuchi commented 1 year ago

At a first glance: sounds reasonable. I haven't thought deeply on this, yet.

BenediktBurger commented 1 year ago

With the resurfaced links in #16 my proposal for the content header:

UUIDv7 as conversation id, maybe with the first 12 bits of random code used for sub millisecond resolution
4 bytes (number of bytes TBD) offset from the conversation id timestamp in milliseconds (e.g. first message has 0, response has some value different from 0). The UUIDv7 plus this offset is the message id (timestamp can be calculated).
Alternatively instead of the offset, we could have a one byte counter: Each response (of the same conversation_id) increases it by one. Advantage: more simple ordering, disadvantage: timestamp of response is not known.
Last: 1 Byte for the Serialization scheme

BenediktBurger commented 9 months ago

Talking about PyMoDAQ in combination with LECO, we considered it good to send binary data, therefore it is good, that we have that byte indicating the serialization scheme.

BenediktBurger commented 9 months ago

Regarding the serialization scheme byte: We could allow a certain range (let's say 127-255) for user defined applications

pymeasure / leco-protocol

Control protocol content header #41