uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Take "ULID with sequence" as the basis for the RFC #27

Closed sergeyprokhorenko closed 2 years ago

sergeyprokhorenko commented 3 years ago

It would be much better to take "ULID with sequence" as the basis for RFC: https://github.com/Sofya2003/ULID-with-sequence

nerg4l commented 3 years ago

All current draft item has a `time|sequence|random" layout with additional version and variant sections and an base32, base32hex, or Crockford's base32 is already in consideration uuid6/new-uuid-encoding-techniques-ietf-draft#3. What else should be added from "ULID with sequence" in your opinion?

fabiolimace commented 3 years ago

I like the "ULID with sequence" structure with a sequence between the timestamp and the random component. I think it would be an excellent variant of Monotonic ULID.

But I don't know if a 48-bit timestamp is good for UUIDs due to the mandatory 6 bits for version and variant.

sergeyprokhorenko commented 3 years ago

Here are the features that should be added:

  1. Optional, locally unique _entitytype ending of the UUID (10 bit), corresponding to some database tables
  2. Default 15-bit clock sequence for millisecond timestamp precision (48 bit timestamp) which allows for 32768 UUIDs per millisecond at peak period. Automatic calculation of best timestamp precision and lenght of clock sequence is desired
  3. Mandatory quantum-mechanical TRNG or CSPRNG
  4. Crockford's base32 string representation recommended in URL and for all new projects (but 8-4-4-4-12 format for backward compartibulity only)
  5. Calculation of random parts in advance, and buffering for high-load applications when needed
  6. Calculation of UUIDs in advance within clock tick, and buffering for high-load applications when needed
  7. UUID creation independently for each database table
  8. UUID creation directly in DBMS for better performance or on client side, but not in application server
  9. Excluded: Timestamp shift for sensitive information
  10. Prohibition on substitution of UUID in records from external sources, except sensitive information
  11. Explicit indication of UTC without timezones in the text of RFC
nerg4l commented 3 years ago
  1. Calculation of random parts in advance, and buffering for high-load applications when needed
  2. Calculation of UUIDs in advance within clock tick, and buffering for high-load applications when needed

I think, these two are up to the implementations and shouldn't be part of the definition.

  1. UUID creation directly in DBMS for better performance or on client side, but not in application server

Can you provide evidence for the "better performance"? In my opinion, this shouldn't be decided up front. Applications running on multiple nodes can outperform a DBMS in id generation.

I personally would never accept any kind of value generated by the client as identifier. You cannot trust that data.

  1. Timestamp shift for sensitive information
  2. Prohibition on substitution of UUID in records from external sources, except sensitive information

Could you describe these in more details or give a link where I can read about them? English is not my native language and I couldn't clearly understand these two points.

  1. Explicit indication of UTC without timezones in the text of RFC

Unix time is the number of seconds that have elapsed since the Unix epoch (1970-01-01 00:00:00 UTC), minus leap seconds. Because of this, I assume defining UTC explicitly is unnecessary.

sergeyprokhorenko commented 3 years ago

UUID creation directly in DBMS for better performance or on client side, but not in application server

It takes too much time to send UUID from application server to DBMS

sergeyprokhorenko commented 3 years ago

Prohibition on substitution of UUID in records from external sources, except sensitive information

IDs from different sources are often changed for new generated IDs in upper layer of system. It's a bad practice. UUIDs must be generated on tne bottom layer of system and must be sent to the upper layers of system without any change.

sergeyprokhorenko commented 3 years ago

Timestamp shift for sensitive information

This point was excluded

sergeyprokhorenko commented 3 years ago

I assume defining UTC explicitly is unnecessary.

It's necessary because many people mistake this point

Cryvage commented 3 years ago

It takes too much time to send UUID from application server to DBMS

If the UUID is created in the DBMS, you will have to send it to the application server, since the application server needs to know the ID of the created object. It seems that despite where the UUID was created, you cannot avoid sending it. It is only a direction what changes. On the other hand, if I create the UUID inside the application, I could continue handling the created entity without waiting for the DBMS response. Dependence on DBMS in terms of creating objects, also affects application architecture. It is clearly should be up to the developer, where and when to create ids.

sergeyprokhorenko commented 3 years ago

The application server does not know the current value of sequence to increment. Therefore it's better to generate UUID at DBMS.

Cryvage commented 3 years ago

The application server does not know the current value of sequence to increment. Therefore it's better to generate UUID at DBMS.

It is a really strange statement. The application server definitely knows the current value of the sequence, it would be strange if it doesn't. Check the paragraphs 4.4.2 and 4.5.2 of current draft. There is nothing said about DBMS specifically. Any application that generates UUIDs should have the clock sequence. The only case when you could face problems with clock sequence is the one when you have many independent UUID generators. Basically this is the case of client side generating, not the server side one.

sergeyprokhorenko commented 3 years ago

Here is a good example of the long (160-bit) UUID: Long ULID for high-load critical systems and IoT

bradleypeabody commented 3 years ago

As an interesting followup to this thread - the current concept of UUIDv7 from https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST-OUTLINE.md is starting to share a fair number of characteristics with ULID.

If the timestamp were changed to be millisecond-precise and use the first 6 bytes, and then combined variant+version field would still be at byte 9, and everything other than that is random. That should be, from what I can tell, 100% compatible with ULID. I hadn't really expected this, but I think this is a really interesting aspect to consider. I'm going to update the outline so this gets reviewed more thoroughly as part of the next draft, and maybe reach out to whoever manages the ULID spec.

sergeyprokhorenko commented 3 years ago

There is no need to use randomly or pseudo-randomly generated version UUIDv8 instead of UUIDv4, because UUIDv4 is enough. Therefore the "Minimal Practical Implementations (Generation)" of UUIDv8 should be something like 160-bit "UUIDv7 on steroids" with metadata at the end:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 UNIX_time_at_100_ns_resolution                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 UNIX_time_at_100_ns_resolution        | count |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     0xE8      |      count-low      |         random          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             random                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |rdm|shard/partit/random|source_ID/hash/rand| entity_type/table |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Links and explanations:

UNIX_time_at_100_ns_resolution is the same as timestamp in UUID version 1 Count, but not clock sequence (on overflow UUIDs with the maximum count number should be generated until the timestamp changes) 0xE8 means version 8 Random value must be generated in advance by quantum-mechanical TRNG or CSPRNG, unique for each UUID Shard Horizontal partitioning Hash of source name may be used as a Shared Knowledge Entity type. This optional field with local values for the specific DB is intended to establish polymorphic relationships between DB tables of complex applications. It also may be used as an anchor name prefix in Anchor modeling