ietf-wg-uuidrev / rfc4122bis

revision to RFC4122
Other
57 stars 11 forks source link

Remove Re-randomize Until Monotonic (Method 3) section #87

Closed LiosK closed 1 year ago

LiosK commented 1 year ago

Reference: 6.2. Monotonicity and Counters

I believe we don't need to (or simply, should not) give prominence to the Re-randomize Until Monotonic approach as Method 3, because it does not have a distinctive benefit over the other methods. To me, Method 3 doesn't seem to make sense.

Firstly, the Method 3 approach is perfectly legal without this section along with Method 1 as per Fixed-Length Dedicated Counter Seeding:

When utilizing a randomly seeded counter alongside Method 1, the random value MAY be regenerated with each counter increment without impacting sortability.

The Fixed-Length Dedicated Counter Length section suggests limiting the counter length to 42 bits but does not forbid utilizing the entire random field as a counter. Method 3 is allowed under Method 1 should an implementer wish to employ it.

Simplicity is not the strength of Method 3 at all. This method requires a generator to memorize the immediately preceding value and accordingly suffers from all the challenges relating to that, such as persistent storage and synchronization mechanisms. Plus, if a generator is able to memorize the previous value, just incrementing the counter field is obviously much simpler.

Method 3 could sound simple because it naturally embodies the counter overflow handling, but that is not the case. If naively implemented, a Method 3 generator enters a busy loop when it has fully consumed the counter space until the next timestamp tick. It is actually not a big deal at all either with Method 1 & 2 if it's okay to just busy-loop upon counter overflows. Challenges of overflow handling are mostly around the graceful handling of timestamp rollbacks and malfunctioning system clocks. For example, it would be a nightmare if your generator claimed 100% CPU power for busy loops when you adjusted your server's system clock back to the correct position, and thus library writers make a lot of effort with many other techniques than a busy loop to avoid such an incident. A naive Method 3 generator will never be a practical implementation.

The performance drawback is tremendous. On average, a Method 3 generator can generate only 74 UUIDs (given the 74-bit counter space) per millisecond, which is incredibly slow, even though the generator consumes a significant amount of CPU time just to throw away unnecessary random numbers.

In conclusion, I would propose to remove the Method 3 section because:

bradleypeabody commented 1 year ago

@LiosK I see where you are coming from on this. Am giving it more thought. My goal here was to have a simple solution for monotonicity that does not require dedicated counter bits, since these reduce entropy for all values (instead of just the values that occur during the same timestamp), and requires whatever additional bookkeeping. But I also agree with the issues you've brought up. I'll give this some thought and see if there's another approach to this which addresses those concerns. I do think there is value in having a solution which only requires dealing with timestamp collisions as needed rather than dedicating counter bits to this in every generated value.

LiosK commented 1 year ago

Hi @bradleypeabody, I see your points. Will try to offer an alternative idea. For your entropy concern, perhaps a Method 2 counter incremented by a large random number works well. For example, if you increment the 74-bit monotonic random bits by a 64-bit random number for each generation within the same millisecond, the resulting UUIDs will be spread over the 74-bit space in a reasonably distributed and unguessable manner. This approach consumes much less CPU power than Method 3 and accommodates 1024 UUIDs per millisecond on average. This approach doesn't require bookkeeping either; the generator only needs to know the previous UUID and does not require any other state.

Technically, counters do not need a state other than the immediately preceding UUID. Please take a look at the signature of my prototype implementation in C. This function is based on a 42-bit Method 1 counter and equipped with fully operational counter overflow and clock rollback handling logics but does not require any state other than the previous UUID passed as an argument. If a generator can memorize the previous UUID, effectively it can do whatever it wants.

LiosK commented 1 year ago

Or alternatively, perhaps we can tweak the first paragraph of the following part to mention something similar to the Method 3 approach:

For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 6 and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. Distributed nodes are discussed in Section 6.4.

Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation, or are otherwise concerned about monotonicity with high frequency UUID generation.

Method 3 is ultimately like generating a UUIDv7 without counters and just seeing if it's greater than the previous one or not, no matter whether timestamp has incremented since the last run or not. In this sense, it is as simple as and close to the approach described in the first paragraph ("simply ensuring that timestamp increments before creating a new UUID"). On the other hand, Method 3 is not appropriate for the second paragraph use cases (batch UUID creation and high frequency UUID generation) as I pointed out in the original post. So, I believe it's better to merge the Method 3 ideas into the first paragraph than to emphasize it as an independent method prepared for the second paragraph use cases.

mcr commented 1 year ago

Do we have requirements to generate more than 74 UUID/ms [is that a typo for 64?] on an ongoing basis, or would it be acceptable to grab timer values into the future to deal with some transient burst of demand?

LiosK commented 1 year ago

Do we have requirements to generate more than 74 UUID/ms

I don't think so, but the methods 1-4 are introduced as methods used in "batch UUID creation" and "high frequency UUID generation" as clearly stated in the leading paragraph. In this sense, Method 3 is never recommended for these purposes because it generates only 74 UUIDs per millisecond despite the 100% consumption of CPU power. With the other methods, JavaScript implementations easily reach 1k+ and Rust 15k+ per millisecond without deep optimization.

74 UUID/ms [is that a typo for 64?]

It is not a typo. UUIDv7's rand_a + rand_b fields give 74-bit space available for counters. With Method 3, the first UUID in a millisecond picks a random number from the 74-bit space, leaving on average 73-bit space available for the second trial. And, the second UUID on average picks one from the 73-bit space, leaving 72-bit space on average for the third UUID, and ....

bradleypeabody commented 1 year ago

@LiosK After looking at this newly I totally agree on this one and I think it is fine to just remove method 3. If implementations do not need to generate more than one value per clock tick then they don't need to use any of the methods, and if they do then I agree the efficiency issues with method 3 are too great to use this. If I have a better idea I'll propose it but yeah, method 3 should just come out.