uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

v6: Retain definition and behavior of v1 fields (avoids confusion, allows for symmetric encoding.) #52

Closed broofa closed 2 years ago

broofa commented 2 years ago

Version 6 changes how the node and clockseq fields behave. This is confusing, and also compromises one of the most important traits version 6 can have: symmetric encoding with version 1.

It's confusing because these fields behave in ways that are contradictory and overlapping between the two versions:

Of more concern, however, is loss of ability to deterministically encode a v6 id from any v1 id, and vice-versa. Imagine a distributed legacy system where different components of the system are being migrated from version 1 to version 6 over time. Not being able to deterministically encode ids means there must be a centralized authority that contains the mapping of each v1 id to each corresponding v6 id. This is fundamentally counter to the entire thesis on which UUIDs are based, and in all likelihood will be (if you'll pardon the expression) an enormous pain in the ass for developers, who very likely chose UUIDs to avoid the need for a central authority.

I believe the community as a whole will be much better served if the version 6 spec consist of nothing more than a description of how version 1 fields are to be laid out (... while ensuring version 1 and 6 ids may be symmetrically encoded.)

fabiolimace commented 2 years ago

The symmetric encoding argument is fundamental in my opinion. V1 and V6 must be deterministically and uniquely mapped to each other through a conversion function.

I created an example (with code) to illustrate a migration from UUID v1 to UUID v6. The example converts the v1 keys to v6 keys in the migration tables.

Example converting UUID v1 to UUID v6

In this example, the primary keys and foreign keys are converted into UUID v6 in migration tables.

Tables using UUID v1

Costumers table:

COSTUMER_ID COSTUMER_NAME
69d2095c-154f-11ec-8a9b-d9086378f195 Kenith Dayton
d2e18561-1621-11ec-8a9b-d9086378f195 Brett Naomi
dfa932ee-869a-11ec-8a9b-d9086378f195 Lyndon Burke

Addresses table:

ADDRESS_ID COSTUMER_ID ADDRESS_TEXT
3c6fd97d-b201-11ec-8a9b-d9086378f195 69d2095c-154f-11ec-8a9b-d9086378f195 8105 Timber Valley Ct, Dunn Loring, Virginia(VA), 22027
1432c665-2764-11ec-8a9b-d9086378f195 d2e18561-1621-11ec-8a9b-d9086378f195 1370 N Archusa Ave, Quitman, Mississippi(MS), 39355
42148ab7-d227-11ec-8a9b-d9086378f195 dfa932ee-869a-11ec-8a9b-d9086378f195 6553 Hallissey Ct, Centreville, Virginia(VA), 20120

Tables using UUID v6 (after migration)

Costumers table:

COSTUMER_ID COSTUMER_NAME
1ec154f6-9d20-695c-8a9b-d9086378f195 Kenith Dayton
1ec1621d-2e18-6561-8a9b-d9086378f195 Brett Naomi
1ec869ad-fa93-62ee-8a9b-d9086378f195 Lyndon Burke

Addresses table:

ADDRESS_ID COSTUMER_ID ADDRESS_TEXT
1ecb2013-c6fd-697d-8a9b-d9086378f195 1ec154f6-9d20-695c-8a9b-d9086378f195 8105 Timber Valley Ct, Dunn Loring, Virginia(VA), 22027
1ec27641-432c-6665-8a9b-d9086378f195 1ec1621d-2e18-6561-8a9b-d9086378f195 1370 N Archusa Ave, Quitman, Mississippi(MS), 39355
1ecd2274-2148-6ab7-8a9b-d9086378f195 1ec869ad-fa93-62ee-8a9b-d9086378f195 6553 Hallissey Ct, Centreville, Virginia(VA), 20120

Example code for PostgreSQL

The code in this section uses PostgreSQL SQL and procedural language.

The example tables use char(36) instead of the native type uuid because it is easier to understand and easier to implement.

------------------------------------------
-- CREATE TABLES
------------------------------------------

CREATE TABLE public.costumers (
    costumer_id char(36) NOT NULL,
    costumer_name varchar(50) NOT NULL,
    CONSTRAINT costumers_pkey PRIMARY KEY (costumer_id)
);

CREATE TABLE public.addresses (
    address_id char(36) NOT NULL,
    costumer_id char(36) NOT NULL,
    address_text varchar(100) NOT NULL,
    CONSTRAINT addresses_pkey PRIMARY KEY (address_id),
    CONSTRAINT addresses_costumer_id_fkey FOREIGN KEY (costumer_id) REFERENCES costumers(costumer_id)
);

------------------------------------------
-- INSERT ROWS INTO TABLES
------------------------------------------

-- the costumer names are fake values generated by [Behind the name](https://www.behindthename.com/random/)
insert into public.costumers values ('69d2095c-154f-11ec-8a9b-d9086378f195', 'Kenith Dayton');
insert into public.costumers values ('d2e18561-1621-11ec-8a9b-d9086378f195', 'Brett Naomi');
insert into public.costumers values ('dfa932ee-869a-11ec-8a9b-d9086378f195', 'Lyndon Burke');

-- the costumer addresses are fake values generated by [BestRandoms](https://www.bestrandoms.com/random-address)
insert into public.addresses values ('3c6fd97d-b201-11ec-8a9b-d9086378f195', '69d2095c-154f-11ec-8a9b-d9086378f195', '8105 Timber Valley Ct, Dunn Loring, Virginia(VA), 22027');
insert into public.addresses values ('1432c665-2764-11ec-8a9b-d9086378f195', 'd2e18561-1621-11ec-8a9b-d9086378f195', '1370 N Archusa Ave, Quitman, Mississippi(MS), 39355');
insert into public.addresses values ('42148ab7-d227-11ec-8a9b-d9086378f195', 'dfa932ee-869a-11ec-8a9b-d9086378f195', '6553 Hallissey Ct, Centreville, Virginia(VA), 20120');

------------------------------------------
-- CREATE CONVERSION FUNCTION
------------------------------------------

create or replace function from_v1_to_v6(uuid_v1 varchar) returns varchar as $$
declare

time_1 varchar;
time_2 varchar;
time_3 varchar;

time_v6 varchar;

clockseq_and_node varchar;

uuid_v6 varchar; 

begin

    time_1 := substring(uuid_v1, 1, 8);
    time_2 := substring(uuid_v1, 10, 4);
    time_3 := substring(uuid_v1, 16, 3);

    time_v6 := time_3 || time_2 || time_1;

    time_1 := substring(time_v6, 1, 8);
    time_2 := substring(time_v6, 9, 4);
    time_3 := '6' || substring(time_v6, 13, 3);

    clockseq_and_node := substring(uuid_v1, 20);

    uuid_v6 := time_1 || '-' || time_2 || '-' || time_3 || '-' || clockseq_and_node;

    return uuid_v6;

end $$ language plpgsql;

-- FOR TESTS
-- select from_v1_to_v6('69d2095c-154f-11ec-8a9b-d9086378f195') = '1ec154f6-9d20-695c-8a9b-d9086378f195';
-- select from_v1_to_v6('d2e18561-1621-11ec-8a9b-d9086378f195') = '1ec1621d-2e18-6561-8a9b-d9086378f195';
-- select from_v1_to_v6('dfa932ee-869a-11ec-8a9b-d9086378f195') = '1ec869ad-fa93-62ee-8a9b-d9086378f195';

------------------------------------------
-- CREATE MIGRATION TABLES
------------------------------------------

create table public.costumers_migration as select * from public.costumers;
create table public.addresses_migration as select * from public.addresses;

------------------------------------------
-- CONVERT KEYS IN MIGRATION TABLES
------------------------------------------

update public.costumers_migration set costumer_id = from_v1_to_v6(costumer_id) where costumer_id ~ '^[0-9a-f-]{14}1';
update public.addresses_migration set address_id = from_v1_to_v6(address_id) where address_id ~ '^[0-9a-f-]{14}1';
update public.addresses_migration set costumer_id = from_v1_to_v6(costumer_id) where costumer_id ~ '^[0-9a-f-]{14}1';

------------------------------------------
-- CHECK IF THE MIGRATION WAS SUCCESSFUL
------------------------------------------

select * from public.costumers c, public.costumers_migration cm
where cm.costumer_id = from_v1_to_v6(c.costumer_id);

select * from public.addresses a, public.addresses_migration am
where am.address_id = from_v1_to_v6(a.address_id)
and am.costumer_id = from_v1_to_v6(a.costumer_id);

Footnote: names and addresses are fake values created by these online generators:

kyzer-davis commented 2 years ago

@broofa, I did a pass on Section 4.3. UUIDv6 Layout and Bit Order for this exact reason in the Draft 02 which changed clock sequence back to the original RFC 4122 usage.

Draft 02: The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5.

Are you referencing Section 4.3.1. UUIDv6 Basic Creation Algorithm? If so I agree, we can add text to make it more clear.

As for node, it was a SHOULD just to be consistent with modern practices of using pseudo-random vs MAC addresses if possible just in case somebody is creating a net-new UUIDv6. But this does also detail that it can be the same as v1 in Section 4.3 for symmetric conversion purposes.

Draft 02: The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose retain the old MAC address behavior from[RFC4122], Section 4.1.6 and [RFC4122], Section 4.5

Back to Section 4.3.1. UUIDv6 Basic Creation Algorithm:

Otherwise, I know the IETF members explicitly wanted the "basic creation algorithm" sections since RFC 4122 has them.

So the Draft 03 Section 4.3.1 change would be:

Draft 03: The following implementation algorithm is based on RFC [RFC4122], 4.2.1 but with changes specific to UUIDv6: [..truncated..]

  1. Generate a 48 bit node value following [RFC4122], 4.1.6 or [RFC4122], 4.5

Let me know what you think!

broofa commented 2 years ago

Are you referencing Section 4.3.1. UUIDv6 Basic Creation Algorithm?

@kyzer-davis: Yes. Unfortunately the steps you mention don't actually match one another, not exactly...

On initializing clockseq: (Typo? Step 8 starts with, "If the state is not available", so there won't be a previous timestamp to compare to.)

On incrementing clockseq: (another typo?)

On setting node:

Regardless of whether the first two are typos, the problem is that a different creation algorithm introduces semantic differences. Different initialization conditions, different behavior in how values do / don't change, etc. While these differences make sense given the conversations we've had around topics such as removing MAC addresses, the bottom line is the behavior changes.

...and behavior changes make crossing the boundary from a UUIDv1 subsystem to a UUIDv6 subsystem (such as occurs while incrementally porting a complex system) problematic. You can't just convert between formats at the boundary layer. You must also concern yourself with how subsystems interpret the fields.

Thus, I suggest we restrict version 6 to layout changes only. This will address the DB locality issues that were the main impetus for the new spec, right? We should deal with everything else in version 7.

Basically I'm saying the outline for §4.3 (version 6) should look like this:

§4.3 - "Version 6 is Version 1 with a different field layout. See RFC4122 §4.1.3-§4.1.6 for field definitions" §4.3.1 "Version 6 Layout" - (details of how fields get laid out) §4.3.2 "Creation Algorithm" - "See RFC4122 §4.2.1, with layout as described above"

ben221199 commented 2 years ago

I think this issue is similar to #13.