Closed sergeyprokhorenko closed 2 years ago
- Disallow extracting timestamp or any other data from UUID
The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.
- Remove ver, var and node from the RFC (like in UUIDv4)
The recommended node value is a 48 bit random or pseudo-random number just as it is for UUIDv4. Form Section 4.3.3
UUIDv6 node bits SHOULD be set to a 48 bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.
- But allow database shard on left part of UUID of any length
I'm not sure I quite understand what you mean by this. Do you mean 589e3e93-6f3b-4008-aef8-da09f7fa2fb2
should be padded left like shard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2
or something else?
- Disallow extracting timestamp or any other data from UUID
The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.
I did not mean any encryption or hash creation or other technical measures. I only meant a ban for data extracting in the text of RFC.
- Remove ver, var and node from the RFC (like in UUIDv4)
- The recommended node value is a 48 bit random or pseudo-random number just as it is for UUIDv4. Form Section 4.3.3
UUIDv6 node bits SHOULD be set to a 48 bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.
OK. But it is better to replace the unclear word node with randomness.
- ver and var are quite important for compatibility and being able to distinguish between versions.
The UUIDs themselves of any versions are actually fully compatibile, because they are of the same lenght, and they are globally unique and indexable in DB.
There is actually no need for distinguishing between versions, because UUIDv7 must be treated like a whole black box and the parsing of UUIDv7 must be banned.
- But allow database shard on left part of UUID of any length
I'm not sure I quite understand what you mean by this. Do you mean
589e3e93-6f3b-4008-aef8-da09f7fa2fb2
should be padded left likeshard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2
or something else?
No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.
@sergeyprokhorenko
Remove ver, var and node from the RFC (like in UUIDv4)
But it is better to replace the unclear word node with randomness.
I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.
The UUIDs theirselves of any versions are actually fully compatibil.
Two RFC-4122 UUIDs of different versions are binary compatible. But I don't agree that non-RFC-4122 UUIDs are compatible with RFC-4122 UUIDs, even though they have the same bit length.
No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.
Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.
Github Repo: https://github.com/tvondra/sequential-uuids Here are some SQL implementations: https://github.com/tvondra/sequential-uuids/issues/2
@sergeyprokhorenko
Remove ver, var and node from the RFC (like in UUIDv4)
But it is better to replace the unclear word node with randomness.
I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.
Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.
UUIDs will never collide thanks to random parts regardless of ver or var.
No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.
Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.
Github Repo: https://github.com/tvondra/sequential-uuids Here are some SQL implementations: tvondra/sequential-uuids#2
No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.
Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.
I don't know if 'mandatory' is the correct word (my English is poor). But that's what I understand from section 4.4. of RFC-4122.
4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers
The algorithm is as follows:
o Set the two most significant bits (bits 6 and 7) of the
clock_seq_hi_and_reserved to zero and one, respectively.
o Set the four most significant bits (bits 12 through 15) of the
time_hi_and_version field to the 4-bit version number from
Section 4.1.3.
UUIDs will never collide thanks to random parts regardless of ver or var.
Sorry. I mean UUIDs with different versions never collide, i.e., a UUIDv1 don't collide with UUIDv4. They are in different 'spaces'.
No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.
Maybe it can be a use case for UUIDv8 (the catch all version).
You are correct. ver and var are used in UUIDv4. Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.
The database shard on left part of UUID is a possible for any version of UUID, because any version of UUID is sortable.
Disallow extracting timestamp or any other data from UUID
Entirely agree. The UUID should be used as a unique identifier and not as a timestamp (MAC address, etc) storage.
IMO clause "4.4.4.2. UUIDv7 Decoding" should be changed to something like this:
Do not rely on UUID internals.
But allow database shard on left part of UUID of any length
Yes. UUIDv7 should be treaded as a btree-friendly and sharding-friendly (partitioning-friendly) variant of UUIDv4.
But it is better to replace the unclear word node with randomness.
I totally agree that we could drop the word 'node' and use something else like 'random', [...]
The term Node
used throughout the draft 01 is directly from RFC 4122, Section 4.1.6. The term was carried over when drafting UUIDv6 and then replicated throughout the document to be consistent both within this document and the previous RFC.
At a glance I don't see any place where Node
is not defined properly in draft 01. If there is a spot where we need to add more clarity to the term node please let me know.
At a glance I don't see any place where
Node
is not defined properly in draft 01
Please note that people take terms as is, without reading in depth. So the terms should be used in their natural meaning. I've never heard that node means random.
Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.
Everything prevents this!
Redefine how the version
bits are used and you break compatibility with existing RFC4122 UUID versions. Redefine how the variant
bits are used and you break compatibility with all other UUIDs. I.e. It's not possible to eliminate or ignore the current semantics of these fields without breaking the guarantee that UUIDs of different versions and variants won't collide.
No UUIDs will collide thanks to random parts (node) of UUIDs.
No UUIDs will collide thanks to random parts (node) of UUIDs
This doesn’t make sense. Take any UUID of any version, change the version (to a valid version #), and you still have a valid UUID. Any randomly set bits will have a non-zero chance of colliding with those same bits in a different-version uuid. Thus, version
is essential to guaranteeing cross-version collisions don’t occur.
… but maybe I misunderstand your point. Can you provide a concrete example?
No UUIDs will collide thanks to random parts (node) of UUIDs.
That's not true. Random does not guarantees uniqueness it decreases the probability of collision.
Microsoft used to create UUIDv1 when System.Guid.NewGuid()
is called and then they moved to UUIDv4. With the version variant it is guaranteed previous UUIDv1 won't collide with UUIDv4 and they can identify old and new ids. If in the future they decide to use UUDvX then the ver and var bits will again guarantee the lack of collision between versions. Each version is generated differently so without ver and var there is more chance of collision because UUIDv1 (with node id) might align.
In short, if you generate a v1 449c7bd6-00ca-11ec-9a03-0242ac130003
and create MyUUID which does not contain ver and var then you risk the probability of colliding with the UUIDv1 you previously generated.
It doesn’t make sense to demand probability of collision between versions less than between UUIDs of the same version. By the way, the 160-bit UUID will never collide 128-bit UUID regardless var or ver.
Maybe I'm wrong but as far as I know, this project wants to extend RFC4122 and not redefine it. If you think you could create a better definition and finalise it as an RFC to be a standard then you should create a different draft not tight to RFC4122.
I see that this project attempts to improve ugly RFC-4122 and overcome the outdated restrictions. And it's much easier to add amendments than create a new RFC.
Treat UUID like a black box: