Open bilderbuchi opened 1 year ago
Unique IDs are good, but I like also to have a possibility to match a response to the original request. We can put that information in the header (transport level) or in the content of a message.
A free settable "subject" (maybe additional to the ID/timestamp) has the benefit, that you can filter the answer more easily, as you do not have to remember with which message ID you requested that information. Example: You request something regularly and give as subject "Request5". Whenever you receive an answer with that subject, you know how to handle it. Without the subject, you would have to keep the id of your original message and then look up what to do and to delete that entry in the list.
I'd try to keep the ID as short as possible to reduce traffic (maybe it does not matter anyway).
Another point to consider: Each computer might have slightly different clock, therefore the timestamps of the messages won't match exactly. I guess it won't be a problem, I just wanted to mention it.
I'd try to keep the ID as short as possible to reduce traffic (maybe it does not matter anyway).
I think we should measure/try that before deciding either way. I agree, if the format is variable, the random part can be tailored to what we expect. However, if a format is widely known/standardised/available via multiple implementations, that might trump saving a couple of bytes per message.
I like also to have a possibility to match a response to the original request. We can put that information in the header (transport level) or in the content of a message.
I'd put that into the header (as it's "routing info", not the payload/content per se). I was thinking of a reply-reference
field that could indicate the message this is a reply to. However, imo this is orthogonal to message identifiers and we should track that in a separate issue.
Same with the message format, I can open an issue with my few notes so far in the evening.
However, if a format is widely known/standardised/available via multiple implementations, that might trump saving a couple of bytes per message.
I agree. We should take the standards into consideration for "possible". It might be a deciding factor for one or another standard.
I was thinking of a reply-reference field that
I like that name.
However, imo this is orthogonal to message identifiers and we should track that in a separate issue.
If we decide, that message identifiers are unique, they are orthogonal. If we would use the message id for a whole conversation (reply and response) they would enter here.
I think it is good to have (at least the possibility, not necessarily the obligation) a unique identifier for each message. Therefore the reply enters another field and issue.
Same with the message format, I can open an issue with my few notes so far in the evening.
Just for naming the issues: We have basically four parts:
I think it is good to have (at least the possibility, not necessarily the obligation) a unique identifier for each message. Therefore the reply enters another field and issue.
I think that unique ids for each message should be obligatory, as from that you can construct the sequence of messages after the fact. This won't be possible if we have one id per conversation/thread (e.g. if clocks are not perfectly synchronized you can't rely on the timestamps).
So, you want a different message format for data and control messages, correct?
I want different formats for the differen protocols, because they are like E-Mail and TV.
The data protocol does not require a recipient nor a answer modality. Also the content will be different. If we only allow data (in the sense of values, for example sensor values) and no commands etc, we can keep the data protocol very simple.
Everything else goes over the (more complicated) command protocol. You can request data via the command protocol as well, but that is only one use case.
OK, I just opened #20. Maybe the header can stay the same? Let's continue over there.
An overview/analysis of the "new" UUID formats: https://blog.devgenius.io/analyzing-new-unique-identifier-formats-uuidv6-uuidv7-and-uuidv8-d6cc5cd7391a IETF draft at https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
Thanks for the links. I'm for using UUIDv7
Another argument for UUID7: https://buildkite.com/blog/goodbye-integers-hello-uuids The context (database keys) is a bit different than ours, but we want what discussions regard as a potential weakness (leaking timestamps from DB keys). Only mentioned drawback:
UUIDs are 128 bits long, twice as large compared to the 64 bit length of other alternative solutions. There is some additional storage overhead, but this is marginal when taking into account the storage of the rest of a database row, and the benefits of migration offset the overhead for our use case.
But still, considering this standard seems bound to be ratified, I think it's worth it to go with a known standardized (IETF!) scheme that is bound to be(come) familiar with users, instead of some custom scheme that might be more efficient in some respect.
So, I agree: Let's choose UUIDv7.
To keep track of our messages, we should have unique identifiers per message, something like UUIDs. UUIDs can contain timestamps, too, which might come in handy, as we could encode that in the same information (saving space in the protocol).
UUID
The IETF has identified a couple of useful criteria for UUIDs to have:
Most of these sound useful for us, too. For example, we could sort a database/collection of messages (maybe from different nodes) by the UUID, and they would automatically be arranged by time. Also, we could parse the timestamp out of the UUID easily (afaict).
I have reviewed the currently available UUID versions, and they don't fit that need so well. The versions 6,7,8 from that IETF draft linked above sound useful, but alas, it is still in draft state, so we probably won't see wide adaptations soon.
UUID7
Apparently, implementations of this are available, e.g. https://pypi.org/project/uuid7/ -- might be worth it to investigate if we should go with the thing that should become a standard. Maybe v6 or v8, too?
ULID
Another concept is the ULID (Universally Unique Lexicographically Sortable Identifier). 48-bit timestamp (i.e. millisecond resolution), which should be enough for our purposes, then 80 bits of randomness (that's 1e24 for every millisecond). The latter might even be reduced for our purposes.
Crucially, implementations are available in many languagues! The encoding seems also much more readable (alphabet-based instead of hex) -- UUID: a9957082-0b47-11ed-8a91-3cf011fe32f1, ULID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
Customized format
We could use another format where we discard some entropy from the random part to encode human-meaningful data in, say a 3-byte message type or somesuch.