uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Long UUID #32

Closed sergeyprokhorenko closed 2 years ago

sergeyprokhorenko commented 3 years ago

And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.

160-bit identifiers would be great right now. By the way, 160-bit identifiers in Crockford's base32 would be the same length as 128-bit identifiers in usual UUID string format. Therefore they are compartible.

Originally posted by @sergeyprokhorenko in https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-899789863

broofa commented 3 years ago

RFC4122 UUIDs are 128 bits. UUIDs of difference lengths should be addressed in a new and different spec rather than as an enhancement to 4122.

sergeyprokhorenko commented 3 years ago

Yes, it's better to bury the RFC-4122. It is ugly.

bradleypeabody commented 3 years ago

See https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md where it describes introducing variable length UUIDs. I believe this will address this concern. 128-bits would be the default, but implementations could produce longer UUIDs if needed.

sergeyprokhorenko commented 3 years ago

128-bits would be the default, but implementations could produce longer UUIDs if needed.

I would recommend 160-bits, binary storage and Crockford's base32 input/output as another default for critical applications.

bradleypeabody commented 3 years ago

The argument against that is there are many implementations which currently use 128 bits. I do not believe the case for 160 bits is strong enough to tell people that the new UUID spec is not compatible with existing 128-bit implementations. Why 160 bits? Why not 256? And for which use cases? I can guarantee you that if we chose 160 bits someone else would come along and say some other length is better for some other reason. I literally have had arguments with people about how we should make UUIDs smaller so they are more useful in URLs or other cases, etc. Optional variable length allows 160 bits for the cases you're considering, and other alternate lengths for whatever other situations, while maintaining backward compatibility with 128-bit as the default.

sergeyprokhorenko commented 3 years ago

https://github.com/uuid6/uuid6-ietf-draft/issues/34#issuecomment-903167032

bradleypeabody commented 3 years ago

@sergeyprokhorenko There are many implementations which define UUID as 16 bytes. Unless this length is completely broken and there is some severe problem with it, I'm not on board with having a different default. A RECOMMENDATION, that could be more realistic - the draft could explain the reasons why longer UUIDs are better, I'm open to that.

But simple fact is that we want implementations that already have UUID defined in a bunch of places as [16]byte to be able to continue to function with little or no change when adopting the new standard. Does this make sense?

I think it would be more productive to focus on clearly laying out the factors of why the longer values are better - I assume the key reason for this is to reduce collision probability - which probably should end up in table form. That information I think would be useful and helpful and could end up in the draft/rfc.

But just declaring that the new standard is 160 instead of 128, I just don't think it's workable because of how different it is and how much it would break. Besides, you'll say 160, and someone else will say 192 bits is the way to go, etc.

There is no absolute "this length is good enough for everyone". I'll write more in response to edo on this a bit later today as regards collision probability and "global uniqueness" (which is, unforuntely, an impossibility without a pre-arranged scheme between machines/implementations.)

sergeyprokhorenko commented 3 years ago

It would be helpful for UUIDv7 and UUIDv8 to provide a good example of the long (longer than 128-bit) UUID layout whose length would add up from reasonably long parts, without ugly compromises and drawbacks. For example, for highload applications and IoT we need 100ns timestamp precision, 15-bit clock sequence (if real clock accuracy is 1 ms), 10-bit local entity type, 5-bit checksum, sufficient length of the random part

bradleypeabody commented 3 years ago

@sergeyprokhorenko Thanks and yes I totally agree that having a good example of how this would go and the case(s) where it's needed - that's absolutely something that could go in the draft.

What is the "10-bit local entity type"?

On the checksum, that's an interesting point. I would argue that for most applications you want the checksum to be on whatever message is carrying the UUID along with whatever data, rather than having it take up space in the UUID itself. E.g. UDP and TCP already provide a checksum mechanism, and so if your application is using one of those protocols then mind as will rely on it. BUT, it's absolutely a valid point that there could be other applications where an internal checksum is warranted (e.g. if you are sending raw IP packets, or over a transmission medium like a UART or maybe RS-485, where no built-in checksum mechanism is available). And so if an application wants to devote some bits to that, I see no problem with that and it should absolutely be allowed by the spec.

For text UUIDs, Crockford base32 does have a checksum mechanism, which I think should be optional - but it's another place where a checksum could easily be added using an existing mechanism rather than inventing something new (I'll add a bit more on this to latest notes).

Also as a note, for the UUID length you mention in #33 - it's a similar answer: usually the surrounding environment will have a way to indicate the length of a value. E.g. in msgpack you can include a set of bytes and the length is part of the encoding. We don't want to turn UUID into a wire protocol, because wire protocols already exist, and the case of including an arbitrary set of bytes with its length is extremely common and already a solved problem.

sergeyprokhorenko commented 3 years ago

Yes, I prefer to use the existing mechanism of check symbol, described in Crockford base32. This check symbol is exactly 5 bits long. It's useful for manual operations with UUID.

Unfortunately some check symbols (*, $, =) in Crockford's base32 are not allowed in URL. And they cannot be replaced with other symbols. So it's better not to use the check symbol at all.

sergeyprokhorenko commented 3 years ago

What is the "10-bit local entity type"?

This optional field is intended to establish polymorphic relationships between DB tables of complex financial applications. For any UUID it is easy to find its dictionary, because entity type points to the dictionary. There may be UUIDs of any entity types in one array field. Entity types are local for the specific DB. If this field is not used for its intended purpose, then it is filled with random numbers. It also helps to find all tables where the UUID of specific entity type could be contained.

sergeyprokhorenko commented 3 years ago

Here is a good example of the long (160-bit) UUID: Long ULID for high-load critical systems and IoT