uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Draft 03 PR #58

Closed bradleypeabody closed 2 years ago

bradleypeabody commented 2 years ago

@kyzer-davis Here is what I'm thinking in terms of a newer and simpler format:

There are still various TODOs in there, which I plan on filling in later in the coming week. And a lot of the text under UUID Concerns is still pretty terse and note-like, but I think that can be fixed up as well without too much trouble. I also left the previous text down at the bottom as comments for easy reference in case I moved stuff out that we want to bring back.

But yeah, let me know what you think in terms of the overall approach with this.

broofa commented 2 years ago

Is there a reason for creating these as new documents rather than revising the existing ones? Seems like it'd be easier to review if we could see side-by-side changes in this PR.

bradleypeabody commented 2 years ago

@broofa fixed

broofa commented 2 years ago

A UUID with all zero bytes is never valid

Isn't this the NIL UUID?

ver_var field

While I appreciate the simplicity of this approach, I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

bradleypeabody commented 2 years ago

@broofa

Isn't this the NIL UUID?

Good point, yes that's the same thing. It's interesting that RFC4122 doesn't indicate the semantic meaning of a Nil UUID or when one would use it. Seeing that there is no specified use for it, I think it makes sense to recommend a check for Nil UUID as a valid substitute for "is this a valid UUID", in the interest of discouraging unnecessary UUID introspection. Let me know if you think otherwise.

I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

We discussed this here but I don't feel like there is a convincing-enough argument against combining these into one field. I'll drop another comment on that ticket so we can carry on the discussion further there if needed.

Other than that, what do you think of the layout overall? Does it seem simpler and more approachable?

kyzer-davis commented 2 years ago

I am all for being streamlined in getting the need-to-know up front then discussing some of the finer details at the end.

I knocked out a few things today:

kyzer-davis commented 2 years ago

Work on 2-16-2022 (ignore the 2021 typo in the commit...):

broofa commented 2 years ago

recommend a check for Nil UUID as a valid substitute for "is this a valid UUID"

Let's not mess with the verbiage around the Nil UUID.

  1. It is defined in the RFC. Ergo, it's valid. (e.g. uuidjs permits it in validation, and also provides a NIL constant.)
  2. If it's expected to be invalid, there's no need to address it as a special case. 0b00 is not a valid variant. That this is in the spec suggests there is significance to it beyond being simply a special invalid value.
  3. Even if it's arguably invalid, specifying semantics for it now will almost certainly cause existing uses fall afoul of whatever definition we put forward.

... for example, it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages).

broofa commented 2 years ago

BTW, what happened to the verbiage around the sub-second timestamp field being encoded in fractional-binary notation?

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

kyzer-davis commented 2 years ago

Summary of work for 2-17-2022

Edit: still a ton left to do. I will resume work on Draft 03 Monday.

bradleypeabody commented 2 years ago

@broofa

it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages)

I definitely agree with this. I'll update the language to remove mentions of validation and essentially say the above. The point of the earlier text is that I think UUID validation is generally unnecessary (because everywhere else on the internet when someone generates an ID, the thing storing it just says "thanks" not "this ID is in an invalid format" - I think it's the complexity of RFC4122 that causes people to feel that validation is necessary when in fact I think it's just unneeded complexity that serves practically no purpose). But yeah, just saying "If you need to express the concept of 'there is no UUID here', e.g. database NULL, you can use the Nil UUID" - that totally works.

Regarding variable length and text formats, let's move that to the separate issues, I'll tag you and Fabio there and we can hash those points out.

bradleypeabody commented 2 years ago

@broofa

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

Yeah this is one of the reasons I wanted to get this new draft being looked at.

Based on our discussion about typical clock resolutions, and my experience just trying to explain the concept to people, I changed my position on this idea of variable time length encoding to be not worth the effort. Such a scheme is still possible in UUIDv8 if it's really needed. But basically most of the people I talked to about it sort of just scratched their heads and went "hm, uh I guess so, I sort of think I get it". Neither my explanation, nor the need for such a system seemed to proove useful after the shopping the idea around. So I'm okay to just get it over it and go with a milliseconds timestamp. It will be a lot easier for folks to implement, and I think serves the purpose quite well.

LiosK commented 2 years ago

I kinda miss and would appreciate the smart subsec fraction idea, but the choice of timestamp encoding might be a trivial issue because, for a given size of binary timestamp field, the same information can be placed regardless of encoding techniques. R.I.P.

broofa commented 2 years ago

I'm okay to just get it over it and go with a milliseconds timestamp

At the risk of horsing a dead beat...

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44? 44 bits would suffice for dates to year 2527. I only mention this because the current verbiage ("big endian") means the first 6 high-order bits won't be used for nearly a century. The first 4 for 500 years.

Seems like a waste. E.g. a 44 bit unix_ts_ms field layout...

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      unix_ts_ms       |                  rand_a               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    var_ver    |            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fabiolimace commented 2 years ago

A 44 bits millisecond field gives 500 years of lifespan. It's more than enough in my opinion. At the end of this lifespan, we probably won't be using UUIDs anymore, unless something goes wrong along the way (Madmax, Skynet, etc).

The remaining 20 bits in rand_a can be used for:

So I agree that maybe 48 bits is too much.

P.S. a 20-bit counter can gives us 1 billion monotonic UUIDs per second instead of just 65k. UUIDv1 gives us 10 million.

kyzer-davis commented 2 years ago

@broofa

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44?

This is just where the logical previous ver bits would start. 48 was a good value when we had to splice in the 4 bit version but I agree, if we are going with the new variant+version position then we can do whatever we want with the first 64 bits.

I am totally fine to drop this down to 44 bits and give some bits back to rand_a though. I will address this in the next commit.

Edit: and ULID uses 48 bits for timestamp.

kyzer-davis commented 2 years ago

To Do's are dwindling... I can see light at the end of the editing tunnel!

Work from "Twosday" 2-22-2022

Others

bradleypeabody commented 2 years ago

On some of these recent changes, I think we're loosing some important aspects that work together:

fabiolimace commented 2 years ago

The only objection I have against moving the version number closer to the variant is that it reserves 2 more bits.

Perhaps it is more beneficial, in terms of entropy gain, to remove the version number altogether and just create a versionless 'E' variant. But it would exclude what we now call UUIDv8.

Alternatively, we could create an 'E' variant plus 1 bit (or 2 bits) to separate the key spaces for UUIDv7 from UUIv8. But it would require other names for them.

EDIT: the metaphor was removed from this comment for not being objective.

kyzer-davis commented 2 years ago

There is a single TODO!

@bradleypeabody I need a C code snippet for UUIDv7 in the appendix!

Work from today:

Edit: uuid6/uuid6-ietf-draft#53 should now be covered in "Monotonicity, Counters" section. Edit 2: uuid6/uuid6-ietf-draft#36 should also be covered by "Distributed UUID Generation" and "Uniqueness Guarantees" sections

bradleypeabody commented 2 years ago

@kyzer-davis Great!

UUIDv7 generation in C:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
    printf("fopen /dev/urandom error\n");
    return 1;
}

// ...

// generate one UUIDv7
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
    printf("clock_gettime error: %d\n", ret);
    return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

printf("tms: %lld\n", tms);

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0xE7; // set var-ver field
kyzer-davis commented 2 years ago

Today's changelog:

At this point I do not have any additional items to change and all of the TO DOs are gone. I have been combing through the issue tracker and do not see any other additional items from the threads to hit in this PR. Please bring them to my attention if I have missed something important.

I will return Monday to check this thread and work on any required items called out to me.

Cheers,

Edit: I hope the IETF folks don't ask us bring back the RFC4122 style "basic algorithm" sections. I am a fan of omitting them entirely as we have in Draft 03 but if somebody feels otherwise please let me know.

broofa commented 2 years ago

At this point I do not have any additional items to change and all of the TO DOs are gone... Please bring them to my attention if I have missed something important.

Where are we at with uuid6/uuid6-ietf-draft#26? I'm concerned that our rationale for changing the variant won't hold up to scrutiny once this spec is published.

BTW, have we reached out to any of the original RFC authors for feedback on all of this? (mmealling and richsalz appear to be on github). I'm curious what they intended for "future uses" of the 0b111 variant.

bradleypeabody commented 2 years ago

BTW, have we reached out to any of the original RFC authors for feedback on all of this?

This is a good point. I think richsalz was involved in some of the early discussion on the IETF mailing list, IIRC, but that discussion was years ago now and the idea has of course changed a lot since.

Getting the 03 draft to the IETF is a good way to get feedback. Most of the early IETF mailing list discussion culminated in (to parapharse) "just make a draft, so we can see what you're talking about in detail".

kyzer-davis commented 2 years ago

Thanks to everybody for helping review the current Draft 03!

I may be the one editing the XML file but the groups feedback is beyond invaluable!

Today's changelog:

ben221199 commented 2 years ago

https://github.com/uuid6/uuid6-ietf-draft/issues/26#issuecomment-1055733006