uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Treat UUID like a black box #28

Closed sergeyprokhorenko closed 2 years ago

sergeyprokhorenko commented 3 years ago

Treat UUID like a black box:

nerg4l commented 3 years ago
  • Disallow extracting timestamp or any other data from UUID

The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.

  • Remove ver, var and node from the RFC (like in UUIDv4)
  • But allow database shard on left part of UUID of any length

I'm not sure I quite understand what you mean by this. Do you mean 589e3e93-6f3b-4008-aef8-da09f7fa2fb2 should be padded left like shard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2 or something else?

sergeyprokhorenko commented 3 years ago
  • Disallow extracting timestamp or any other data from UUID

The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.

I did not mean any encryption or hash creation or other technical measures. I only meant a ban for data extracting in the text of RFC.

  • Remove ver, var and node from the RFC (like in UUIDv4)
  • The recommended node value is a 48 bit random or pseudo-random number just as it is for UUIDv4. Form Section 4.3.3

    UUIDv6 node bits SHOULD be set to a 48 bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.

OK. But it is better to replace the unclear word node with randomness.

  • ver and var are quite important for compatibility and being able to distinguish between versions.

The UUIDs themselves of any versions are actually fully compatibile, because they are of the same lenght, and they are globally unique and indexable in DB.

There is actually no need for distinguishing between versions, because UUIDv7 must be treated like a whole black box and the parsing of UUIDv7 must be banned.

  • But allow database shard on left part of UUID of any length

I'm not sure I quite understand what you mean by this. Do you mean 589e3e93-6f3b-4008-aef8-da09f7fa2fb2 should be padded left like shard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2 or something else?

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

fabiolimace commented 3 years ago

@sergeyprokhorenko

Remove ver, var and node from the RFC (like in UUIDv4)

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.

The UUIDs theirselves of any versions are actually fully compatibil.

Two RFC-4122 UUIDs of different versions are binary compatible. But I don't agree that non-RFC-4122 UUIDs are compatible with RFC-4122 UUIDs, even though they have the same bit length.

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.

Github Repo: https://github.com/tvondra/sequential-uuids Here are some SQL implementations: https://github.com/tvondra/sequential-uuids/issues/2

sergeyprokhorenko commented 3 years ago

@sergeyprokhorenko

Remove ver, var and node from the RFC (like in UUIDv4)

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.

Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.

UUIDs will never collide thanks to random parts regardless of ver or var.

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.

Github Repo: https://github.com/tvondra/sequential-uuids Here are some SQL implementations: tvondra/sequential-uuids#2

No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.

fabiolimace commented 3 years ago

Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.

I don't know if 'mandatory' is the correct word (my English is poor). But that's what I understand from section 4.4. of RFC-4122.

4.4.  Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers

The algorithm is as follows:

   o  Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

   o  Set the four most significant bits (bits 12 through 15) of the
      time_hi_and_version field to the 4-bit version number from
      Section 4.1.3.

UUIDs will never collide thanks to random parts regardless of ver or var.

Sorry. I mean UUIDs with different versions never collide, i.e., a UUIDv1 don't collide with UUIDv4. They are in different 'spaces'.

No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.

Maybe it can be a use case for UUIDv8 (the catch all version).

sergeyprokhorenko commented 3 years ago

You are correct. ver and var are used in UUIDv4. Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.

The database shard on left part of UUID is a possible for any version of UUID, because any version of UUID is sortable.

edo1 commented 3 years ago

Disallow extracting timestamp or any other data from UUID

Entirely agree. The UUID should be used as a unique identifier and not as a timestamp (MAC address, etc) storage.

IMO clause "4.4.4.2. UUIDv7 Decoding" should be changed to something like this:

Do not rely on UUID internals.

edo1 commented 3 years ago

But allow database shard on left part of UUID of any length

Yes. UUIDv7 should be treaded as a btree-friendly and sharding-friendly (partitioning-friendly) variant of UUIDv4.

kyzer-davis commented 3 years ago

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', [...]

The term Node used throughout the draft 01 is directly from RFC 4122, Section 4.1.6. The term was carried over when drafting UUIDv6 and then replicated throughout the document to be consistent both within this document and the previous RFC.

At a glance I don't see any place where Node is not defined properly in draft 01. If there is a spot where we need to add more clarity to the term node please let me know.

sergeyprokhorenko commented 3 years ago

At a glance I don't see any place where Node is not defined properly in draft 01

Please note that people take terms as is, without reading in depth. So the terms should be used in their natural meaning. I've never heard that node means random.

broofa commented 3 years ago

Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.

Everything prevents this!

Redefine how the version bits are used and you break compatibility with existing RFC4122 UUID versions. Redefine how the variant bits are used and you break compatibility with all other UUIDs. I.e. It's not possible to eliminate or ignore the current semantics of these fields without breaking the guarantee that UUIDs of different versions and variants won't collide.

sergeyprokhorenko commented 3 years ago

No UUIDs will collide thanks to random parts (node) of UUIDs.

broofa commented 3 years ago

No UUIDs will collide thanks to random parts (node) of UUIDs

This doesn’t make sense. Take any UUID of any version, change the version (to a valid version #), and you still have a valid UUID. Any randomly set bits will have a non-zero chance of colliding with those same bits in a different-version uuid. Thus, version is essential to guaranteeing cross-version collisions don’t occur.

… but maybe I misunderstand your point. Can you provide a concrete example?

nerg4l commented 3 years ago

No UUIDs will collide thanks to random parts (node) of UUIDs.

That's not true. Random does not guarantees uniqueness it decreases the probability of collision.

Microsoft used to create UUIDv1 when System.Guid.NewGuid() is called and then they moved to UUIDv4. With the version variant it is guaranteed previous UUIDv1 won't collide with UUIDv4 and they can identify old and new ids. If in the future they decide to use UUDvX then the ver and var bits will again guarantee the lack of collision between versions. Each version is generated differently so without ver and var there is more chance of collision because UUIDv1 (with node id) might align.

In short, if you generate a v1 449c7bd6-00ca-11ec-9a03-0242ac130003 and create MyUUID which does not contain ver and var then you risk the probability of colliding with the UUIDv1 you previously generated.

sergeyprokhorenko commented 3 years ago

It doesn’t make sense to demand probability of collision between versions less than between UUIDs of the same version. By the way, the 160-bit UUID will never collide 128-bit UUID regardless var or ver.

nerg4l commented 3 years ago

Maybe I'm wrong but as far as I know, this project wants to extend RFC4122 and not redefine it. If you think you could create a better definition and finalise it as an RFC to be a standard then you should create a different draft not tight to RFC4122.

sergeyprokhorenko commented 3 years ago

I see that this project attempts to improve ugly RFC-4122 and overcome the outdated restrictions. And it's much easier to add amendments than create a new RFC.