Draft 03 PR - Githubissues

bradleypeabody commented 2 years ago

@kyzer-davis Here is what I'm thinking in terms of a newer and simpler format:

The preamble combines some of the explanation in shorter bullet points
Each format is described as concisely as possible, omitting things like the duplicative explanation from RFC4122, or detailed explanations about various tradeoffs that are described later.
Each of the various topics are now covered in a UUID Concerns section and this addresses things like Monotonicity, Opacity, Variable Length, Text Format, etc. - each as their own topic. The idea here is to make the explanation of each UUID version very simple and easy to understand, and move each of the longer arguments about tradeoffs etc into their own thing so they can be covered separately, and read by the reader when they are actually interested in a given topic. Otherwise with the way it was before, it was kind of like you were forced to read through a long treatise about all kinds of factors before being able to just get a clear-cut explanation of what each of the fields mean and what belongs in them.

There are still various TODOs in there, which I plan on filling in later in the coming week. And a lot of the text under UUID Concerns is still pretty terse and note-like, but I think that can be fixed up as well without too much trouble. I also left the previous text down at the bottom as comments for easy reference in case I moved stuff out that we want to bring back.

But yeah, let me know what you think in terms of the overall approach with this.

broofa commented 2 years ago

Is there a reason for creating these as new documents rather than revising the existing ones? Seems like it'd be easier to review if we could see side-by-side changes in this PR.

bradleypeabody commented 2 years ago

@broofa fixed

broofa commented 2 years ago

A UUID with all zero bytes is never valid

Isn't this the NIL UUID?

ver_var field

While I appreciate the simplicity of this approach, I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

bradleypeabody commented 2 years ago

@broofa

Isn't this the NIL UUID?

Good point, yes that's the same thing. It's interesting that RFC4122 doesn't indicate the semantic meaning of a Nil UUID or when one would use it. Seeing that there is no specified use for it, I think it makes sense to recommend a check for Nil UUID as a valid substitute for "is this a valid UUID", in the interest of discouraging unnecessary UUID introspection. Let me know if you think otherwise.

I thought we'd resolved to stick with the RFC4122 locations / definitions for the version and variant fields.

We discussed this here but I don't feel like there is a convincing-enough argument against combining these into one field. I'll drop another comment on that ticket so we can carry on the discussion further there if needed.

Other than that, what do you think of the layout overall? Does it seem simpler and more approachable?

kyzer-davis commented 2 years ago

I am all for being streamlined in getting the need-to-know up front then discussing some of the finer details at the end.

I knocked out a few things today:

Added HTML version
Updated changelog calling out variant change
Did a pass on "Combined Variant and Version Fields" and changed the section name to just "Variant and Version Fields"
Filled out some of the diagram blocks
Removed bullet on all zero UUID as per @broofa's comment about NIL UUID
Misc. Typo fixes

kyzer-davis commented 2 years ago

Work on 2-16-2022 (ignore the 2021 typo in the commit...):

More Changelog updates to summarize at-a-glance changes from Draft-02 vs Draft-03
Added var back to UUIDv6 ASCII layout as this is an important set of bits
Verified <figure><artwork></artwork></figure> tags are valid for the code snipets as per IETF RFC7749
Added code snippet placeholder for v7
Added Test Vector ASCII placehold table to v6 and v7
Modified v7 ASCII layout to have rand_a and rand_b (before and after var_ver) to make it easier to discuss the relevant section of bits.
Modified v8 with the same logic of custom_a and custom_b for the pre and post var_ver bits
Removed <section> tags from "Implementation Tradeoffs" section so that it shows up under "UUID Concerns" and avoids an empty section.
Hit most of the "TODO link section below" references plus added a few backward and forward references with <xref target="sectionLink"/>
Gave another pass to "UUID Concerns" fleshing out the text to be less informal.
Added more TODO's for tracking
Added some further Acknowledgements

broofa commented 2 years ago

recommend a check for Nil UUID as a valid substitute for "is this a valid UUID"

Let's not mess with the verbiage around the Nil UUID.

It is defined in the RFC. Ergo, it's valid. (e.g. uuidjs permits it in validation, and also provides a NIL constant.)
If it's expected to be invalid, there's no need to address it as a special case. 0b00 is not a valid variant. That this is in the spec suggests there is significance to it beyond being simply a special invalid value.
Even if it's arguably invalid, specifying semantics for it now will almost certainly cause existing uses fall afoul of whatever definition we put forward.

... for example, it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages).

broofa commented 2 years ago

~~BTW, what happened to the verbiage around the sub-second timestamp field being encoded in fractional-binary notation?~~

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

kyzer-davis commented 2 years ago

Summary of work for 2-17-2022

Ran the document through spell check so hopefully @broofa doesn't beat me up anymore about spelling issues :)
Noticed we reference Octet 9 for variant bits, this is actually octet 8 as per verbiage where this document and RFC4122 start definitions with octet 0. Fixed this in the the entire document.
Added further clarifications on Variant 111+0 and 111+1
Added Format section and put 3x UUID layouts in that to reduce empty sections
Added IETF terminology section moved/updated Requirements Language and added new Abbreviations section
Moved C code samples to first appendix with two sub-sections as per RFC 7749.
Moved Test vector samples to second appendix with two sub-sections as per RFC 7749.
Knocked out most of the items @broofa recommended, closing the recommended changes.
Added more forward/reverse references where required.
Deleted the old XML RFC comment, Draft 02 can be found in the repo.
Moved some TODOs to comments to promote readability by editors while we iron out the TODO text.
Changed all "#-bit" references to "# bit" as per Draft 02 update about the same topic.
Linked uuid6/uuid6-ietf-draft#49, uuid6/uuid6-ietf-draft#51 , uuid6/uuid6-ietf-draft#55 which should be sufficiently covered currently.

Edit: still a ton left to do. I will resume work on Draft 03 Monday.

bradleypeabody commented 2 years ago

@broofa

it serves as a good placeholder for "null" or "undefined" in places where type constraints otherwise require a valid UUID (e.g. dbs or typed languages)

I definitely agree with this. I'll update the language to remove mentions of validation and essentially say the above. The point of the earlier text is that I think UUID validation is generally unnecessary (because everywhere else on the internet when someone generates an ID, the thing storing it just says "thanks" not "this ID is in an invalid format" - I think it's the complexity of RFC4122 that causes people to feel that validation is necessary when in fact I think it's just unneeded complexity that serves practically no purpose). But yeah, just saying "If you need to express the concept of 'there is no UUID here', e.g. database NULL, you can use the Nil UUID" - that totally works.

Regarding variable length and text formats, let's move that to the separate issues, I'll tag you and Fabio there and we can hash those points out.

bradleypeabody commented 2 years ago

@broofa

Wait... we're going with fixed-length unix milliseconds and doing away with all the node/counter/subsecond time stuff for version 7 now? I'm not opposed to this (in fact, I rather like the new format... a lot), but this sweeps a lot of previous discussion aside. I feel like I missed something.

Yeah this is one of the reasons I wanted to get this new draft being looked at.

Based on our discussion about typical clock resolutions, and my experience just trying to explain the concept to people, I changed my position on this idea of variable time length encoding to be not worth the effort. Such a scheme is still possible in UUIDv8 if it's really needed. But basically most of the people I talked to about it sort of just scratched their heads and went "hm, uh I guess so, I sort of think I get it". Neither my explanation, nor the need for such a system seemed to proove useful after the shopping the idea around. So I'm okay to just get it over it and go with a milliseconds timestamp. It will be a lot easier for folks to implement, and I think serves the purpose quite well.

LiosK commented 2 years ago

I kinda miss and would appreciate the smart subsec fraction idea, but the choice of timestamp encoding might be a trivial issue because, for a given size of binary timestamp field, the same information can be placed regardless of encoding techniques. R.I.P.

broofa commented 2 years ago

I'm okay to just get it over it and go with a milliseconds timestamp

At the risk of horsing a dead beat...

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44? 44 bits would suffice for dates to year 2527. I only mention this because the current verbiage ("big endian") means the first 6 high-order bits won't be used for nearly a century. The first 4 for 500 years.

Seems like a waste. E.g. a 44 bit unix_ts_ms field layout...

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      unix_ts_ms       |                  rand_a               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    var_ver    |            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

fabiolimace commented 2 years ago

A 44 bits millisecond field gives 500 years of lifespan. It's more than enough in my opinion. At the end of this lifespan, we probably won't be using UUIDs anymore, unless something goes wrong along the way (Madmax, Skynet, etc).

The remaining 20 bits in rand_a can be used for:

a counter up to 2^20 (1048576) (monotonicity); or
submillisecond up to nanosecond (precision); or
just random bits (entropy).

So I agree that maybe 48 bits is too much.

P.S. a 20-bit counter can gives us 1 billion monotonic UUIDs per second instead of just 65k. UUIDv1 gives us 10 million.

kyzer-davis commented 2 years ago

@broofa

Is there a reason unix_ts_ms is 48 bits instead of 42 or 44?

This is just where the logical previous ver bits would start. 48 was a good value when we had to splice in the 4 bit version but I agree, if we are going with the new variant+version position then we can do whatever we want with the first 64 bits.

I am totally fine to drop this down to 44 bits and give some bits back to rand_a though. I will address this in the next commit.

Edit: and ULID uses 48 bits for timestamp.

kyzer-davis commented 2 years ago

To Do's are dwindling... I can see light at the end of the editing tunnel!

Work from "Twosday" 2-22-2022

Reverted back to Draft 02 Abstract
Removed Format and Alternate Encoding references
Moved variant and version bit sections up to format section since these are important to understanding the bit layouts. They where out of place way at the end which does not promote readability.
UUIDv6 now only a 1:1 reshuffle + moved some text from Draft 02 to Draft 03 including the ASCII diagram and table. Should sufficiently cover uuid6/uuid6-ietf-draft#52
Changed UUIDv7 to 44 bit timestamp
Scoped UUIDv8 back to time-specific + moved some text from Draft 02 to drive the points home. Should be sufficient for uuid6/uuid6-ietf-draft#31
Finished UUIDv1 vs UUIDv6 test vectors in Appendix B.
Finished UUIDv7 test vector in Appendix B.
Finished UUIDv8 test vector in Appendix B.
Changed "UUID Concerns" section to "UUID Best Practices"

Others

uuid6/uuid6-ietf-draft#28 should be covered now in "UUID Best Practices Section, subsection: Storing UUIDs, Opacity"
uuid6/uuid6-ietf-draft#45 covered by current UUIDv7 text (or lack there of)
uuid6/uuid6-ietf-draft#44 is now out of scope for UUIDv7 but still eligible for UUIDv8.

bradleypeabody commented 2 years ago

On some of these recent changes, I think we're loosing some important aspects that work together:

48 bit timestamp - admittedly this is was chosen because it's a clean byte-boundary and is easier to use. If we change it to 44 bits, it just that much more bit swizzling required. This goes along with the var-ver change, moving the version bits out of the way.
This works in tandem with the variable length approach. If you need more bytes for entropy, just add them. Some of the questions brought up around this are valid and I can go through and answer those, and possibly we can limit the scope of when the doc suggests to use longer IDs, but I think there is still a strong case for including them. There has been a huge amount of prior discussion about entropy and "are there are enough bits" for various cases and degrees of collision resistance.
The crockford base32 text format was in there because not having a compact text form defeats of the most important use cases here which is making an ID which is useful for databases. Having an overly long text form for IDs just for legacy reasons is not a good tradeoff for this case of database implementing PKs as UUIDs. If the problem is the poor specification and unclarity of the alphabet, I think we can specify a subset that is compatible with a typical crockford base32 implementation but also strict enough to make the minimum implementation simple.

fabiolimace commented 2 years ago

The only objection I have against moving the version number closer to the variant is that it reserves 2 more bits.

Perhaps it is more beneficial, in terms of entropy gain, to remove the version number altogether and just create a versionless 'E' variant. But it would exclude what we now call UUIDv8.

Alternatively, we could create an 'E' variant plus 1 bit (or 2 bits) to separate the key spaces for UUIDv7 from UUIv8. But it would require other names for them.

EDIT: the metaphor was removed from this comment for not being objective.

kyzer-davis commented 2 years ago

There is a single TODO!

@bradleypeabody I need a C code snippet for UUIDv7 in the appendix!

Work from today:

Reverted UUIDv7 to 48 bits. UUIDv7 is @bradleypeabody's and I defer changing that value to him.
Reverted UUIDv8 back to "anything goes" vs "time-specific" as I missed an email from Brad stating this was the goal in Draft 03.
Re-ordered "UUID Best Practices" sub-sections (Fleshed out text in every one of these sections.) @broofa I saw you reviewing this today, it was not complete but should be in a better place after this commit.
Added uuid6/uuid6-ietf-draft#57 as an example code snippet for UUIDv8 in the appendix.
Fixed var_ver typo in UUIDv8 test vector. Thanks @fabiolimace

Edit: uuid6/uuid6-ietf-draft#53 should now be covered in "Monotonicity, Counters" section. Edit 2: uuid6/uuid6-ietf-draft#36 should also be covered by "Distributed UUID Generation" and "Uniqueness Guarantees" sections

bradleypeabody commented 2 years ago

@kyzer-davis Great!

UUIDv7 generation in C:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
    printf("fopen /dev/urandom error\n");
    return 1;
}

// ...

// generate one UUIDv7
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
    printf("clock_gettime error: %d\n", ret);
    return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

printf("tms: %lld\n", tms);

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0xE7; // set var-ver field

kyzer-davis commented 2 years ago

Today's changelog:

Added @bradleypeabody's C function for UUIDv7
More polish and formatting to promote readability
Ensured that I put guidance about uuid6/uuid6-ietf-draft#54
Hit topic of "clock sequence" but kept it brief as it stems from uuid6/uuid6-ietf-draft#41
Modified test vectors to have random data for UUIDv1 and v6 clock sequence.
Removed Documentation section. This is an implementation detail and it seemed a bit out of place in the document.

At this point I do not have any additional items to change and all of the TO DOs are gone. I have been combing through the issue tracker and do not see any other additional items from the threads to hit in this PR. Please bring them to my attention if I have missed something important.

I will return Monday to check this thread and work on any required items called out to me.

Cheers,

Edit: I hope the IETF folks don't ask us bring back the RFC4122 style "basic algorithm" sections. I am a fan of omitting them entirely as we have in Draft 03 but if somebody feels otherwise please let me know.

broofa commented 2 years ago

At this point I do not have any additional items to change and all of the TO DOs are gone... Please bring them to my attention if I have missed something important.

Where are we at with uuid6/uuid6-ietf-draft#26? I'm concerned that our rationale for changing the variant won't hold up to scrutiny once this spec is published.

BTW, have we reached out to any of the original RFC authors for feedback on all of this? (mmealling and richsalz appear to be on github). I'm curious what they intended for "future uses" of the 0b111 variant.

bradleypeabody commented 2 years ago

BTW, have we reached out to any of the original RFC authors for feedback on all of this?

This is a good point. I think richsalz was involved in some of the early discussion on the IETF mailing list, IIRC, but that discussion was years ago now and the idea has of course changed a lot since.

Getting the 03 draft to the IETF is a good way to get feedback. Most of the early IETF mailing list discussion culminated in (to parapharse) "just make a draft, so we can see what you're talking about in detail".

kyzer-davis commented 2 years ago

Thanks to everybody for helping review the current Draft 03!

I may be the one editing the XML file but the groups feedback is beyond invaluable!

Today's changelog:

Fixed @LiosK's suggestions on typos
Removed Text Format section since it served little purpose at the moment.
Fixed some other typos
Added more acknowledgements
MUST to SHOULD based on uuid6/uuid6-ietf-draft#60

ben221199 commented 2 years ago

https://github.com/uuid6/uuid6-ietf-draft/issues/26#issuecomment-1055733006

uuid6 / uuid6-ietf-draft

Draft 03 PR #58