New Draft 03 Ideas - Githubissues

bradleypeabody commented 3 years ago

@kyzer-davis and @edo1 @sergeyprokhorenko @fabiolimace @nerg4l @broofa @nurked (hopefully I didn't miss anyone).

I typed up a summary of what I suggest should go into a new draft, along with the rationale.

https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md

I tried to keep it as succinct as possible and had to make some decisions, I'm sure some people will disagree on various things.

It would be great if we could review these issues being discussed against that and see how close this is to addressing everyone's concerns and try to start narrowing down what specifically should be in the next draft. The detailed discussion by issue can stay on the individual issue threads to keep them focused but if there are specific changes in the doc above and help move us toward an exact list of items to address, definitely let me know.

bradleypeabody commented 3 years ago

@kyzer-davis hopefully I'm not stepping on your toes with that v8 explanation. My point there was just to punch up the idea that it would be essentially free-form except for the version field, and while a lot of the existing data in the draft will work great as implementation suggestion, I think at it's core it can just be "do what you want, as long as you have v8 in the version field and the variant bits set correctly". Let me know if we need to discuss.

nerg4l commented 3 years ago

UUIDs should be as opaque as possible [...]

I agree with this statement.

For v7 and v8 merge variant and version fields into one single 8-bit field. [...]

I like and I hate this idea at the same time. Moving the version segment would simplify a lot but it would break many implementations. Even if they are "BROKEN implementations".

For v7 and v8 we can add the capabiliy of variable length. [...]

I understand the reason but I think this part of RFC4122 ("A UUID is 128 bits long") should be kept as is. It would introduce all kind of problems.

Introduce Crockford base32 as a standard text encoding [...]

This would allow to store UUIDs as string in a smaller footprint which is quite a good idea. Is there any progress about the collaboration between this draft and draft-taylor-uuid-ncname?

UUIDv6

This is basically COMB and it would be great to extend RFC4122 with it.

UUIDv7

This one is the most important element of this draft guessing from the number of comments it generates. This layout looks good to me at the moment. Unfortunately, it requires var 0b111 and I'm not sold on that idea yet.

UUIDv8

I think it is a good idea but it does not guarantee any kind of uniqueness which could create a lot of problems in the future. If it will have variable length as well then the situation will be even worth.

Finally, maybe we could include more people who are working on RFC4122/UUID implementations. Many of them have years of experience with that document and they could provide valuable feedbacks. I already saw @broofa who is a maintainer of https://github.com/uuidjs/uuid.

bradleypeabody commented 3 years ago

Thanks for that feedback. A few notes in reply:

break many implementations

Just to be clear on this though, we're talking about implementations which are currently reading the version field without looking at the variant and then doing something specific based on that, right? I agree this is a concern, but if it's a big enough deal to not merge the var and ver fields is hard to say. Implementations would have to be updated if they want to generate any of these new UUIDs, and the change to existing code is trivial... So yeah, I agree with the concern, but not sure how we can ascertain how a big of a real world problem this is. Maybe some research needs to be done on existing implementations (particularly the ones in database software right now) to see what would happen today if these new versions were just stored in there without updating the software. I think that case is what really matters the most.

(variable length) It would introduce all kind of problems.

If implementations can choose to implement variable length or not, what kind of problems do you foresee? I'm thinking it would just boil down to implementations that are stuck with 128 bits don't support variable length (or at least not longer, shorter could be zero padded when stored, possibly) and others that want to support variable length can.

Is there any progress about the collaboration between this draft and draft-taylor-uuid-ncname?

I haven't looked at this recently but very good point, needs more review.

(UUIDv8) I think it is a good idea but it does not guarantee any kind of uniqueness

My concern on this is that I think uniqueness is a bit of a phantom property. If you sit down and analyze it, it is not actually possible to guarantee uniqueness without some prior shared knowledge. If I pick a number based on time and/or randomness and you do the same, there is no way to guarantee we won't pick the same number UNLESS we have some shared knowledge ahead of time. This was what the MAC address in RFC4122 was intended to solve, and it wasn't a terrible solution, it just wasn't perfect.

I think the spec should explain and describe the tradeoffs, specify what is absolutely necessary for a compliant implementation (which would be much more lax than RFC4122 is), and then provide implementation suggestions. If we do this, I don't see any reason UUIDv8 wouldn't fit nicely into that pattern.

broofa commented 3 years ago

People SHOULD use a CSPRNG to generate random values, but it doesn't necessarily mean the implementation is broken if it does not.

I disagree. In practice, any Implementations that doesn't use a high-quality ("cryptographic") RNG will be considered to have a security vulnerability. We've seen this in the uuid module already: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8851. Furthermore, poor RNG implementations break the commonly held assumptions people make about the probability of collisions, which I would argue does mean the implementation is broken.

UUIDs should be as opaque as possible, but we necessarily have exceptions.

While I don't necessarily disagree with this, I don't know that there's any meaningful verbiage related to this that should be in the proposal. At most, maybe have a brief, non-normalative comment to this effect?

For v7 and v8 merge variant and version fields into one single 8-bit field. (new variant value = 0b111)

I'm intrigued by this idea but, ultimately, I don't think it's appropriate to introduce this change at this time. Everything discussed so far can be accomplished with the existing variant/version layout. Also, it's just weird that versions 1-5 would appear in one location, while versions 6-15 may appear in another..

UUIDv6

I still think this should be removed from the spec. This format is already obsoleted by UUIDv7, so standardizing it just propagates a format that shouldn't be getting adopted elsewhere. Anyone currently generating Peabody UUIDs can set version=8 as an interim solution until such time as they're able to migrate to UUIDv7.

UUIDv7

I've recently come to the opinion that nanosecond resolution timestamps are a bad idea. In practice, system clocks rarely have even millisecond accuracy. Thus, all of the bits used to provide sub-second timestamp precision are little more than a poorly-named sequence counter.

I know this is likely to be controversial, but I think keeping the timestamp simple - a unix-epoch, 1-second precision value (probably 36 bits) - is all that should ever be required. This frees up 28 bits for use in the sequence and/or node portions of the id.

This should probably be discussed in a new, separate issue, but I'm putting it up here to get thoughts. (Sorry, I'm on vacation/travel so haven't had the time to completely think this through)

UUIDv8

Other than not changing the size/location of the version field, this looks great.

broofa commented 3 years ago

From the current IETF-submitted draft:

Updates: 4122 (if approved)

...

This document is a proposal to update [RFC4122] ...

I do not consider the following items "updates". Rather, they should all be addressed in a completely new spec, after much deliberation and discussion:

New string representations
New UUID bit lengths
New variants
New variant and version field layout

None of these are solving well-defined problems. For example, I'm unaware of anyone who has generated enough UUIDs for there to be a significant risk of collision in the current 2¹²² value space. I.e. "nobody actually needs 160 bit UUIDs". The same can be said for these other items. Good ideas in principal, but not solving enough real-world problems to make them worth the additional complexity.

I think this project would be well served by the creation (and liberal use of) an "Out of Scope" issue label. 😉

bradleypeabody commented 3 years ago

Thanks. Some notes.

I do not consider the following items "updates". Rather, they should all be addressed in a completely new spec, after much deliberation and discussion

I'm not too concerned about what is an update to RFC4122 vs something totally new. I'd like to focus on what is useful vs not. The subject of new UUIDs was first introduced almost two years ago to the IETF and there is a lot of discussion on the mailing lists, etc. Let's focus on arguing each point on it's merits and the change the language of the draft to match. If it's something new, it's something new. The key point is what will be required for adoption - how different a new spec is vs the existing is less important that how hard will it be for people to adapt and implement.

None of these are solving well-defined problems.

These each do have factors that are driving them (now they may not be of the same importance/priority as the original concept of sort-ability, but they are not just miscellany - they have come up many times in past discussions - I will see if I can dig up some links to these if its useful)

shorter string representation is useful because people use UUIDs in many different contexts and e.g. in URLs the hex format is just unnecessarily long. The best storage solution for UUIDs is of course binary, but it's not possible in all cases and saving space in text form is a useful, practical, benefit.
variable length: This has also come up a lot. 128 bits is more or less arbitrary. The primary reason it is important is because existing implementations expect it - which I agree is a vital factor to consider. But I do not believe it immediately makes it unviable to consider what would happen if other lengths were optionally allowed. The problem it solves is "uniqueness" boils down to one of two things: shared knowledge, or just reducing the probability of collision. For systems which cannot implement a shared knowledge solution, the degree of collision resistance required is application specific. There is no length where everyone will just agree "this is fine" - it just won't happen. So allowing people to choose would solve this problem.
The goal of the variant and version field merge is simplicity. The existing RFC4122 is quite complex with lots of nuance, much of which is, IMO, not necessary. I agree we have consider the impact of making such a change carefully, but it does solve a problem: it's simpler. Unnecessary complexity is a very real and tangible problem when implementing a specification.

any Implementations that doesn't use a high-quality ("cryptographic") RNG will be considered to have a security vulnerability

Why any implementation? It really depends on the use case. Virtually every database out there right now has a feature to generate unique IDs that are not cryptographically hard to guess. People use these features every day, they work fine. Now I agree that SOME implementations require CSPRNG, but I don't see how forcing every implementation to do this as a requirement for it to be correct helps things.

(UUIDv6) I still think this should be removed from the spec.

Maybe - I do agree it would simplify things. The question really boils down to the value of the "easy to adapt from UUIDv1" property - that's its main benefit. Is it worth adding more to the spec.... hard to say.

I've recently come to the opinion that nanosecond resolution timestamps are a bad idea...

I share the concern, but A) not sure about clocks being that inaccurate and B) I definitely think a solution for filling in the least significant bytes with random data should be described in the spec - because certainly not all implementations will be super precise. BUT, if we're going to have a timestamp that is possible to parse, I don't see a reason not to make it nice and precise and then just say that implementations can fuzz the least significant part of it. I.e. "here is a timestamp format, if you read it, here's how, but there's no promise that implementations will put the exact right time here - that's up to them".

sergeyprokhorenko commented 3 years ago

I would add optional fields with metadata for long (more than 128 bit) UUIDs:

canceled: UUID length
canceled: UUID structure (bit array)
canceled: checksum (check symbol in Crockford's base32)
local entity type (10 bit). Local entity type should be the last field

sergeyprokhorenko commented 3 years ago

Unfortunately some check symbols (*, $, =) in Crockford's base32 are not allowed in URL. And they cannot be replaced with other symbols. So it's better not to use the check symbol at all.

kyzer-davis commented 2 years ago

To ease conversations about individual topics a Discussion label has been created. Closing this thread since the majority of these items have moved to the individual threads.

If something was missed please bring it back up in the appropriate thread!

uuid6 / uuid6-ietf-draft

New Draft 03 Ideas #33