uuid6 / prototypes

Draft Prototypes and Tests for UUIDv6 and beyond
46 stars 16 forks source link

[v6] Python implementation increments timestamp instead of clock sequence #4

Closed nerg4l closed 3 years ago

nerg4l commented 3 years ago

While having a look at the python prototype and reading the "UUIDv6 Basic Creation Algorithm" section of the draft I noticed a difference between them.

From Section 4.3.4.

  1. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.

https://github.com/uuid6/prototypes/blob/98bc0c1a317b59b289c247cb449ff901d1ff80c4/python/new_uuid.py#L35-L39

I think the implementation in the linked code segment increments the timestamp when it should increment the clock sequence instead.

kyzer-davis commented 3 years ago

Hey @nerg4l, I had noticed this too when forking the UUIDv1 code from the official Python repo. https://github.com/python/cpython/blob/3.9/Lib/uuid.py#L688

I have been meaning to fix it in v6 prototype but have not found the time. After I can submit PR to have the Python folks update their code too since the UUIDv1 creation algorithm is basically the same for that step in RFC 4122: https://datatracker.ietf.org/doc/html/rfc4122#section-4.2.1

kyzer-davis commented 3 years ago

@nerg4l I pushed a branch uuidv7-python which should fix the v1/v6 issue where timestamp is incremented instead of the clock sequence. This is simply re-used code from my UUIDv8 timestamp check.

Sequence Counter Logic:

Testing v6 with Dev Debugs enabled seems to increment properly but let me know what you think.

nerg4l commented 3 years ago

I think it is better to manage the sequence this way. The previous logic altered the time segment which meant whenever someone would read it from the generated uuid they would get an incorrect result.

Just remove clock_seq from the paramtere list. Currently that parameter is not respected. It gets a new value assigned in the method.

RFC 4122 - Section 4.1.5 says the following about clock sequence:

The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems. This provides maximum protection against node identifiers that may move or switch from system to system rapidly. The initial value MUST NOT be correlated to the node identifier.

I don't know how important to respect this requirement in case of v6. The 48 random bit should be enough to avoid collisions.

kyzer-davis commented 3 years ago

@nerg4l Yeah, I just made a commit to drop clock_seq variable from v1/v6. The goal with this v6 prototype was to detail that you only needed to change a few lines of code to change an existing v1 implementation over to v6. Little did I know Python's v1 implementation was non-compliant with RFC4122...

As for the v6: Our current 01 draft re-uses the logic from the section you specified in RFC4122. However, in v1 the Sequence also doubles as entropy since the Node usually may contain a static 48-bit MAC. But in v6 our Node is random as such there is no reason to randomize the Seq to start.

Furthermore, if I recall in testing random Seq start in v8 prototype I found that random sequence counter greatly increased the change the Seq would rollover causing out of order UUIDs. Or if the rollover isn't handled properly overflows where the Seq becomes larger than the space it was meant to encompass resulting in UUIDs larger than 128 bits.

Thus is the reason why I stated v7 and v8 like so:

The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates.

I will discuss with Brad and update draft 02 where required to state that the sequence should start at 0 and increase by 1. Exactly like we do in v7/v8.

nerg4l commented 3 years ago

Furthermore, if I recall in testing random Seq start in v8 prototype I found that random sequence counter greatly increased the change the Seq would rollover causing out of order UUIDs. Or if the rollover isn't handled properly overflows where the Seq becomes larger than the space it was meant to encompass resulting in UUIDs larger than 128 bits.

I had the same observation while writing an implementation for v6.

I will discuss with Brad and update draft 02 where required to state that the sequence should start at 0 and increase by 1. Exactly like we do in v7/v8.

I think that would be a great addition. It would also make the sequence reusable between v6, v7, and v8.

When you discuss this don't forget to take into consideration the rule for systems/languages which cannot provide 100-nanosecond timestamps (e.g. JavaScript in the browser). RFC 4122 - Section 4.2.1.2. suggests the following:

A high resolution timestamp can be simulated by keeping a count of the number of UUIDs that have been generated with the same value of the system time, and using it to construct the low order bits of the timestamp. The count will range between zero and the number of 100-nanosecond intervals per system time interval.

At the moment, there are 3 ways which comes to my mind.

The most optimal would be Option 3 but it might add a lot of unnecessary complexity.