Closed sergeyprokhorenko closed 2 years ago
All current draft item has a `time|sequence|random" layout with additional version and variant sections and an base32, base32hex, or Crockford's base32 is already in consideration uuid6/new-uuid-encoding-techniques-ietf-draft#3. What else should be added from "ULID with sequence" in your opinion?
I like the "ULID with sequence" structure with a sequence between the timestamp and the random component. I think it would be an excellent variant of Monotonic ULID.
But I don't know if a 48-bit timestamp is good for UUIDs due to the mandatory 6 bits for version and variant.
Here are the features that should be added:
- Calculation of random parts in advance, and buffering for high-load applications when needed
- Calculation of UUIDs in advance within clock tick, and buffering for high-load applications when needed
I think, these two are up to the implementations and shouldn't be part of the definition.
- UUID creation directly in DBMS for better performance or on client side, but not in application server
Can you provide evidence for the "better performance"? In my opinion, this shouldn't be decided up front. Applications running on multiple nodes can outperform a DBMS in id generation.
I personally would never accept any kind of value generated by the client as identifier. You cannot trust that data.
- Timestamp shift for sensitive information
- Prohibition on substitution of UUID in records from external sources, except sensitive information
Could you describe these in more details or give a link where I can read about them? English is not my native language and I couldn't clearly understand these two points.
- Explicit indication of UTC without timezones in the text of RFC
Unix time is the number of seconds that have elapsed since the Unix epoch (1970-01-01 00:00:00 UTC), minus leap seconds. Because of this, I assume defining UTC explicitly is unnecessary.
UUID creation directly in DBMS for better performance or on client side, but not in application server
It takes too much time to send UUID from application server to DBMS
Prohibition on substitution of UUID in records from external sources, except sensitive information
IDs from different sources are often changed for new generated IDs in upper layer of system. It's a bad practice. UUIDs must be generated on tne bottom layer of system and must be sent to the upper layers of system without any change.
Timestamp shift for sensitive information
This point was excluded
I assume defining UTC explicitly is unnecessary.
It's necessary because many people mistake this point
It takes too much time to send UUID from application server to DBMS
If the UUID is created in the DBMS, you will have to send it to the application server, since the application server needs to know the ID of the created object. It seems that despite where the UUID was created, you cannot avoid sending it. It is only a direction what changes. On the other hand, if I create the UUID inside the application, I could continue handling the created entity without waiting for the DBMS response. Dependence on DBMS in terms of creating objects, also affects application architecture. It is clearly should be up to the developer, where and when to create ids.
The application server does not know the current value of sequence to increment. Therefore it's better to generate UUID at DBMS.
The application server does not know the current value of sequence to increment. Therefore it's better to generate UUID at DBMS.
It is a really strange statement. The application server definitely knows the current value of the sequence, it would be strange if it doesn't. Check the paragraphs 4.4.2 and 4.5.2 of current draft. There is nothing said about DBMS specifically. Any application that generates UUIDs should have the clock sequence. The only case when you could face problems with clock sequence is the one when you have many independent UUID generators. Basically this is the case of client side generating, not the server side one.
Here is a good example of the long (160-bit) UUID: Long ULID for high-load critical systems and IoT
As an interesting followup to this thread - the current concept of UUIDv7 from https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST-OUTLINE.md is starting to share a fair number of characteristics with ULID.
If the timestamp were changed to be millisecond-precise and use the first 6 bytes, and then combined variant+version field would still be at byte 9, and everything other than that is random. That should be, from what I can tell, 100% compatible with ULID. I hadn't really expected this, but I think this is a really interesting aspect to consider. I'm going to update the outline so this gets reviewed more thoroughly as part of the next draft, and maybe reach out to whoever manages the ULID spec.
There is no need to use randomly or pseudo-randomly generated version UUIDv8 instead of UUIDv4, because UUIDv4 is enough. Therefore the "Minimal Practical Implementations (Generation)" of UUIDv8 should be something like 160-bit "UUIDv7 on steroids" with metadata at the end:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UNIX_time_at_100_ns_resolution |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UNIX_time_at_100_ns_resolution | count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0xE8 | count-low | random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|rdm|shard/partit/random|source_ID/hash/rand| entity_type/table |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Links and explanations:
UNIX_time_at_100_ns_resolution is the same as timestamp in UUID version 1 Count, but not clock sequence (on overflow UUIDs with the maximum count number should be generated until the timestamp changes) 0xE8 means version 8 Random value must be generated in advance by quantum-mechanical TRNG or CSPRNG, unique for each UUID Shard Horizontal partitioning Hash of source name may be used as a Shared Knowledge Entity type. This optional field with local values for the specific DB is intended to establish polymorphic relationships between DB tables of complex applications. It also may be used as an anchor name prefix in Anchor modeling
It would be much better to take "ULID with sequence" as the basis for RFC: https://github.com/Sofya2003/ULID-with-sequence