uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Discussion: Max UUID (ffffffff-ffff-ffff-ffff-ffffffffffff) #62

Open LiosK opened 2 years ago

LiosK commented 2 years ago

DEL (named after ASCII) obviously isn't the best name, but how about reserving the UUID that has all 128 bits set to one, in addition to the Nil UUID? The primary use case of the Del UUID is as a sentinel when processing time-ordered UUID versions, though I don't believe the new RFC needs to specify the semantics of the reserved Nil and Del. For unordered UUIDs, the single Nil value may suffice, but now that the UUID spec gets ordered versions, it should be a good timing to reserve the other end.

At first, I though it was a good idea to invalidate the both ends (MIN and MAX) of UUIDv7 values for the purpose of sentinel, but on second thought after reading the Nil UUID discussion in #58, I believe just reserving the Del UUID is much simpler.

This change is not supposed to break existing applications because 0b111 var has been reserved for future definitions by the original RFC.

broofa commented 2 years ago

My main concern is that this starts to populate the 0b111 variant space, which I'd rather avoid.

The primary use case of the Del UUID is as a sentinel when processing time-ordered UUID versions

Would the maximum sortable value for each version of time-ordered ID work for this? For example:

ffffffff-ffff-1fff-bfff-ffff-ffffffffffff // version 1
ffffffff-ffff-7fff-bfff-ffff-ffffffffffff // version 7 (legacy version-variant)
ffffffff-ffff-ffff-e7ff-ffff-ffffffffffff // version 7 "E" variant

I mean... I don't have strong opinions on this, other than wanting to limit the number of unnecessary constructs in the spec. Not really sure this meets my threshold for added-value, though.

LiosK commented 2 years ago

My main concern is that this starts to populate the 0b111 variant space, which I'd rather avoid.

Agreed, but the actual downside should be acceptable because future definitions can utilize 0b1110 variant.

Would the maximum sortable value for each version of time-ordered ID work for this?

That was my first thought. For UUIDv1, the answer is no because the existing applications might have utilized such a special value as a placeholder, and thus libraries cannot invalidate it now. No for UUIDv6 accordingly to guarantee symmetric conversion between v1 and v6.

For UUIDv7, the answer is technically yes, but creating a reserved value inside the UUIDv7 space will cause validation problems. Generators will have to guarantee that the reserved value is not generated, and some decoders might refuse to treat the reserved value as a valid UUID value. These concerns add extra complexity to implementations and the spec itself. As a result, surprisingly, it turns out to be much simpler to reserve such a special value in the totally separate name space.

Specifying the Del value as a valid, special UUID is also beneficial in that it ensures such a value can be safely handled by RFC-compliant implementations such as uuidjs's regex.js.

kyzer-davis commented 2 years ago

@LiosK could you elaborate a bit more on the usage of such a UUID? What do we gain by having this? What do we lose by not having this? Pseudocode (or real code) showing it's usage. etc.

Also, Max UUID seems like a nice descriptor.

LiosK commented 2 years ago

A classic linear search using a sentinel value:

/** Returns the largest k that satisfies sortedHaystack[k] < needle. */
const lookup1 = (needle, sortedHaystack) => {
  // Append MAX to omit k < sortedHaystack.length condition
  sortedHaystack.push("ffffffff-ffff-ffff-ffff-ffffffffffff");
  let k = 0;
  while (sortedHaystack[k] < needle) k++;
  return k - 1;
};

/** Returns the largest k that satisfies sortedHaystack[k] < needle. */
const lookup2 = (needle, sortedHaystack) => {
  let k = 0;
  while (sortedHaystack[k] < needle && k < sortedHaystack.length) k++;
  return k - 1;
};

A dummy record in a database table:

CREATE TABLE items (uuid TEXT PRIMARY KEY, name TEXT);
INSERT INTO items VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff', 'No item registered');

-- INSERT INTO items VALUES ('0621f59f-53cf-7325-a664-c03b68f106fd', 'An item');
-- INSERT INTO items VALUES ('0621f59f-53cf-732d-8e97-a491d31e209f', 'Another item');

-- Returns the name of item that has the smallest UUID, or 'No item registered' if none is registered.
SELECT name FROM items ORDER BY uuid LIMIT 1;

The above code can be written without the MAX value, but allowing to write code like this improves the ergonomics of UUID. ffffffff-ffff-7fff-bfff-ffff-ffffffffffff cannot necessarily be used in this context because it can be a real UUID value; a dummy code has to be out of the range of legal values.

By specifying the MAX value as a valid, special UUID value in the RFC, the standard can ensure every compliant implementation safely handles the MAX value as a valid value; otherwise, some implementations might refuse to validate/store/instantiate the MAX value (e.g. this regex).

kyzer-davis commented 2 years ago

@LiosK Thanks, that makes it pretty clear. I don't see any reason why we can't put a one liner exactly like RFC 4122 and leave it be.

@bradleypeabody any objections to a small Max UUID section right after v8/v8ε

kyzer-davis commented 2 years ago

@LiosK and @bradleypeabody, I went ahead and put together some text via #68 and includes this in #75 for review.

kyzer-davis commented 2 years ago

Merged in #75 but will label this as a discussion thread and keep it open going forward as a single place to discuss the topic. Edit: Linked in the README as well!

ben221199 commented 2 years ago

First, I want to say that I like Omni UUID more than Max UUID. Second, I want to say that introducing the Omni UUID will cause introducing a new variant.

From #26.

kyzer-davis commented 2 years ago

@ben221199 , I have no reservation in the naming, Nil just didn't seem to convey the point properly. Technically speaking all 1's would be the maximum value for a UUID hence my "Max UUID". Similarly "Omni UUID" "All 1's" make sense too. I just want the name to make sense at-a-glance.

As for the variant, there was some discussion here I will summarize my points for it:

fabiolimace commented 2 years ago

I think "MAX_UUID" is more appropriate than "OMNI_UUID". Even in Romance languages like mine we no longer use "omnis" as a standalone word. For instance, in my language it is just a prefix: "ônibus" (bus), "onipotente" (almighty).

I know omnis is the etymological opposite to nihil. I also have a hard time accepting opposing variables that don't have perfectly opposite names. However, MAX_UUID seems more natural to programmers since we see "MAX" very often. It makes sense at-a-glance.

ben221199 commented 1 year ago

Issue can be closed, afaik.

LiosK commented 1 year ago

I don't really have an opinion about naming, but I have a question: is Omni a right word and word form in Latin for use as an antonym of Nil?

I have absolutely no idea about Latin or what Omni exactly means, and this is my first time seeing this word other than in omnibus and omni-channel. I was just wondering like: MAX is a kind of pragmatic choice, so it should be okay even if the usage is not fully correct from linguistic perspective, but OMNI sounds like a pedantic choice. It would be somewhat awkward if the usage is not seen right from some academic point of view.

Reference for record: https://github.com/ietf-wg-uuidrev/rfc4122bis/issues/95