uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

v7: Change "clock sequence" to "counter" #51

Closed broofa closed 2 years ago

broofa commented 2 years ago

We should avoid using the term "clock sequence" in the version 7 section, as its use there conflicts with how it's defined in versions 1 and 6.

[Edit: Just realized version 6 redefines the "clock sequence" behavior. See #52. I don't think that changes the basic argument here for just calling it "counter"]

Specifically, in versions 1 & 6, a clock sequence is randomly initialized and stable unless the node changes or the system clock regresses. Contrast that to version 7 where the clock sequence (counter) is initialized to zero at the start of each time interval, and increments each time a new ID is generated in that interval.

Just call it what it is: "counter".

fabiolimace commented 2 years ago

I agree.

I think we can completely abolish the term "clock sequence" in v7. In version 1 and 6 it can exist as a legacy field.

However, I prefer a counter that starts with a random number rather than zero. It can be a little confusing as an "act of counting" must start from nothing (zero), but it can help to put more entropy into the identifier.

broofa commented 2 years ago

I prefer a counter that starts with a random number

I think we need to be careful about this. In v1, v6, and v7, counters are treated as low-order components of the timestamp. Version 1 is, in fact, pretty explicit about this. Randomzing the counter risks compromising the monotonicity of timestamp+counter fields.

That said, a counter that was initialized with the high-order bit set to zero and all other bits set randomly would still provide monotonic behavior while also increasing overall entropy. Counter rollover would need to be treated as an error, of course. But this would allow for a minimum of 2n-1 ids per counter interval. ( n = counter bit width)

(Apologies if this idea has already been floated. [Edit: It has])

fabiolimace commented 2 years ago

I like the idea of setting the high-order bit to zero. It is the best balance between monotonicity and randomness.

I read an issue in the ULID spec that suggested something similar. Unfortunately, the maintenance of the ULID spec appears to have ceased.

LiosK commented 2 years ago

I suggested to set the MSB to zero and fill the rest with random in a previous post, but after working on a couple of prototypes, I have changed my mind. I learned from my prototypes that the counter overflows often occurred with the randomly initialized counter configuration, but such overflows were not a big deal at all. All a user of monotonic IDs can do when the counter overflows is just to wait until the wall clock goes forward, and occasionally waiting for a millisecond or less does not usually do a lot of harm to the whole system performance. So, the guarantee of minimum IDs per timestamp is not worth the precious one bit entropy in my opinion.

broofa commented 2 years ago

All a user of monotonic IDs can do when the counter overflows is just to wait until the wall clock goes forward,

@liosk: To be clear, we should in no way consider this a solution to clock rollover. Deferring work until the next clock interval is how you create Cascading Failure modes.

Hence why I suggest rollovers should be surfaced to the user as an error. A rollover is almost certainly a sign of a system whose resources are taxed in a way that cannot be resolved within the scope of this RFC.

LiosK commented 2 years ago

Hi @broofa, I think whether cascading failure or not is an implementation detail. Since someone ultimately has to deal with the rollover, a UUID library should be allowed to handle it by itself, quietly or with logging the event, as well as to throw it to the user of the library. The following paragraph in the original RFC 4122 should still make sense with the new RFC:

If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service MUST either return an error, or stall the UUID generator until the system clock catches up.

The guarantee of minimum IDs per timestamp is also an implementation detail in my opinion. Entropy and the guarantee of minimum IDs are two conflicting goals between which a right balance has to be sought, but RFC or library authors cannot make a good decision because they are not able to determine how many IDs are necessary per timestamp. So I think it is best to say in the RFC something like "initialize the counter with a k-bit random integer where 0 <= k <= number of counter bits" and let implementers decide which k is the most appropriate for their applications, together with the number of counter bits and the mode of cascading failure.