uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Draft 2: Question about V6 clock sequence change, and whether it reduces the value of the current V6 spec #38

Closed theckman closed 2 years ago

theckman commented 3 years ago

Hello,

Thank you for your work on trying to shepherd this draft through the standards track. I help maintain one of the Go UUID libraries, and I'm noodling on releasing a version that implements the draft so folks can begin playing with it and offering feedback.

I was starting to target revision 1, and noticed there are some changes in pending in the revision 2 WIP draft that would make me need to substantially refactor how I'm generating the UUID, specifically the change around how the clock sequence will be handled. No sweat, since I knew I'd be working with things that are subject to change. That said I wanted to see if there was more context behind the reason for that change, knowing that the V6 UUID is supposed to be very similar to the V1.

Ultimately it made me realize that my V6 generator would share even less code with my V1 generator, since the clock sequence would need to be generated differently. So I started to wonder if there is even value in having the V6 UUID in its current form at all, if we're going to make it even more different than V1, and whether we should just make the V7 the V6 and guide everyone towards that richer UUID value.

I'd love to hear the context behind the change, and your thoughts on whether the change devalues V6 UUID's goal of being similar to the V1 format.

fabiolimace commented 3 years ago

I was also worried about the clock sequence in V6. Thanks for asking @theckman .

One of the clock sequence tasks in V1 is to help avoid collisions in UUID generators that for some reason share the same node identifier. For example, when more than one generator runs on the same machine, using the same MAC address as the node identifier. If the clock sequence is always initialized to 0000, collisions are very likely to occur. The clock sequence in V1 is initialized with a random value to mitigate this problem.

If the only thing that changes is the order of the bytes in timestamp, all the code related to the clock sequence and the node identifier can be reused. So I think V6 can be specified as a simple byte rearrangement and maybe the preference (not requirement) of random node identifier (with multicast bit set to ONE) over MAC address.

EDIT: Better not change the node identifier. Let's define V6 as a simple byte swap and focus our efforts on V7.

theckman commented 3 years ago

EDIT: Better not change the node identifier. Let's define V6 as a simple byte swap and focus our efforts on V7.

@fabiolimace As some random dude on the Internet who basically has no skin in the game, I'd be in favor of this (keeping the revision 01 spec). :+1: 😄 I do like having the extra randomness of the clock sequence, versus it mostly being set to zero, so I'd be in favor of reverting the clock sequence change currently in the revision 02 draft.

fabiolimace commented 3 years ago

I also vote to revert the text related to clock sequence to this:

4.3.2.  UUIDv6 Clock Sequence Usage

   UUIDv6 makes no change to the Clock Sequence usage defined by
   [RFC4122], Section 4.1.5.
bradleypeabody commented 3 years ago

Thanks and yes I totally agree.

One of the clock sequence tasks in V1 is to help avoid collisions in UUID generators that for some reason share the same node identifier

My bad, I missed this fact in my review of RFC4122 and my thought was resetting it to zero would allow for the most values before rolling over or the algorithm having no choice but to block until the next timestamp tick.

I've added a note in the UUIDv6 section here to ensure in the next draft we remove changes to the sequence counter and instead the only adjustments would be:

  1. Reordering the timestamp
  2. Suggest using random bytes instead of MAC address (optional)

Let me know if that solves it.

bradleypeabody commented 3 years ago

Regarding

Better not change the node identifier. Let's define V6 as a simple byte swap ...

I don't see a downside to allowing it (use of random data instead of MAC address). Certainly it would not be required and existing implementations don't have to modify that field. After all, the goal is easy adaptation from v1.

theckman commented 3 years ago

@bradleypeabody everyone would have a unified mental model of how their UUID is generated if we are rigid in our definition(s) and say only random data can be used. There should never be a question.

Anecdotally I was able to reuse a lot of my V1 generation code because I could just swap how those bits were being written. So it doesn't seem to make the spec less appealing in my eyes if it was reverted to the rev 01 definition. Please let me know if I am missing any context that may be relevant.

Edit: And if it doesn't work for someone for some reason, V8 exists. 😄

bradleypeabody commented 3 years ago

@theckman I see what you're saying, but unfortunately there is just a huge amount of variation in what different applications need for their purposes and in different environments. I've tried to outline some of these factors in the later sections here.

RFC4122 already has this https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.6

For UUID version 1, the node field consists of an IEEE 802 MAC address, usually the host address... For systems with no IEEE address, a randomly or pseudo-randomly generated value may be used...

We'd basically just be relaxing that and saying a UUID generator is not required to use the MAC address just because it exists.

So while this would be nice:

everyone would have a unified mental model of how their UUID is generated if we are rigid in our definition(s)

I just don't think it's realistic, unfortunately. Otherwise what could happen - just as one example: Someone could run into duplicate MAC addresses on different network segments because of how cloud virtual network interfaces work, and then, strictly speaking, the UUID implementation would violate the spec if it used random data instead. (Because RFC4122 says "systems with no IEEE address", and yet the system does have one - the spec says nothing about what to do if the MAC address is suspect or just plain not unique.)

theckman commented 3 years ago

@bradleypeabody I think we may be moving beyond the scope I originally intended for this issue, but that's okay because I think these are important topics. 😄

I'm personally in favor of relaxing the node requirement for V6, compared to V1, by not only permitting but encouraging the use of random data instead of an IEEE 802MAC address. I'd personally prefer to see us move away from supporting a MAC address as node data in V6, but could be convinced of the value in saying implementers "MAY" use a MAC address.

bradleypeabody commented 3 years ago

Yup, I agree. The exact wording will need to be nailed down but yes something to the effect of SHOULD use random, MAY retain the old MAC address behavior from RFC4122.