ulid / spec

The canonical spec for ulid
GNU General Public License v3.0
9.73k stars 174 forks source link

Request clarification in readme on the nature of 128-bit compatibility with UUID #85

Open mofosyne opened 1 year ago

mofosyne commented 1 year ago

E.g. what is the canonical way to convert between UUID and ULID form? Is it possible to losslessly encode a ULID by having an extra field storing the timestamp?

mofosyne commented 1 year ago

Also is there a representation for Null or Max like in UUID?

mofosyne commented 1 year ago

Okay having a look at Database Auto increment vs UUID - Which is Right for You? I saw this visual comparison

image

So i can see while timestamp can be transferred losslessly, UUIDv7 can store only 74bits of random id vs ULID 80bits of random payload


Ramsey was even more explicit https://uuid.ramsey.dev/en/stable/rfc4122/version7.html

Both use a 48-bit timestamp in milliseconds since the Unix Epoch, filling the rest with random data. Version 7 UUIDs then add the version and variant bits required by the UUID specification, which reduces the randomness from 80 bits to 74. Otherwise, they are identical.

So what this effectively means is you need to store an extra 6bit somewhere else, or truncate it.


It would still be important to at least come to a common standard for truncating this 6bit, so that if you convert from UUIDv7 to ULID and back and fourth a few times between various implementations... we do not lose the randomness data due to erroneous truncation and padding.

techdragon commented 1 year ago

I'm looking at this issue as a kind of "one way trip" where I go from ULID to UUIDv7 and just stop using ULID completely... Because unless ULIDv2 is just UUIDv7 with 26(?) char representation and using base32 representation instead of hex, then yeah... There's not going to be a benefit to using an alternative implementation like ULID everywhere compared to having Database and Language build in support for the newer RFC standard UUIDv7... I just don't need those extra 6 bits, and a sortable UUID like value was all I wanted, the extra niceties like the string format and having plenty of libraries supporting it were just icing on the cake...

I suppose this is kind of an existential issue in a way now, with the new UUIDs looking like they will get standardized, has ULID served its purpose? should the spec be brought into line with a v2? are the representational niceties of a base32 26 char string even worth ULID as a wrapper on UUIDv7? or would it be better as a well accepted custom format of UUIDv8 in order to preserve more timestamp bits and less randomness bits?

booch commented 1 year ago

@techdragon I've proposed deprecating in favor of UUIDv7 in #91.

arokettu commented 6 months ago

It would still be important to at least come to a common standard for truncating this 6bit, so that if you convert from UUIDv7 to ULID and back and fourth a few times between various implementations... we do not lose the randomness data due to erroneous truncation and padding.

It's the same specific 6 bits, I just overwrite them, after that the ID can be safely interpreted as either UUIDv7 or ULID

https://github.com/arokettu/php-uuid/blob/master/src/Ulid.php#L34

mofosyne commented 6 months ago

It would still be important to at least come to a common standard for truncating this 6bit, so that if you convert from UUIDv7 to ULID and back and fourth a few times between various implementations... we do not lose the randomness data due to erroneous truncation and padding.

It's the same specific 6 bits, I just overwrite them, after that the ID can be safely interpreted as either UUIDv7 or ULID

https://github.com/arokettu/php-uuid/blob/master/src/Ulid.php#L34

Then certainly it would do well to note that in the spec so other implementers can remember to do the same?

hholst80 commented 6 months ago

An interoperability guide would be good here how to convert between a well defined subset of ULID and UUID v7.

mofosyne commented 4 months ago

ramsey.dev written up something about converting it back and forth in https://uuid.ramsey.dev/en/stable/rfc4122/version7.html#convert-a-version-7-uuid-to-a-ulid but I still don't really get the bit pattern that's transferred between both forms.

arokettu commented 4 months ago

@mofosyne No bits are changed when converting UUIDv7 to ULID because UUIDv7 is already a valid ULID with a valid timestamp, you only convert the string representation from base16 to base32, the binary form does not change

mofosyne commented 4 months ago

I see. Well this diagram I drew points out that it's a one way operation from UUID to ULID:

# Generated UUID example

             Ver  Var
              |    |
0190b5e3-a615-7365-94ab-b6727c819ba4
|------------|----------------------|
 Timestamp      Ver+Randomness+Var
   48bits             80bits

# Above UUID hex when converted to Crockford's base32 and mapped to ULID:

 068BBRX62N      SPB55BPSS7S0CVMG
|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

Note that the UUID version and variants is part of the random area and is treated as an opaque random ID field. The implication is that this has slightly less randomness and higher collision, but again should be in practice not an issue for most application.

Also moving from UUID to ULID is possible, but not sure about moving backward... even if the ver and var matches... how do you know it's not because of luck?

If this is a good explanation, then maybe we could include it in the readme etc...

FYI: Used this https://cryptii.com/pipes/crockford-base32 to convert between both forms

arokettu commented 4 months ago

@mofosyne The other way is problematic

Let's use an example ULID that I just generated:

01J2VZHEW5GATWJGM56CS5Q64A

convert it to hex:

0x0190b7f8bb8582b5c9428533325b988a

Write as UUID:

0190b7f8-bb85-82b5-c942-8533325b988a

Overwrite version bits (13th hex digit):

0190b7f8-bb85-72b5-c942-8533325b988a

Overwrite variant bits (17th hex digit, it must become 10xx):

'c' => 1100 => 10|00 => 8

0190b7f8-bb85-72b5-8942-8533325b988a

Problems:

1) It's a different ID now 2) You may create a retroactive collision (but the chance is negligible) 3) If it's a public ID, you have to do this conversion on every user request with the old ID 4) Even if you think that 3 is OK, if you accept UUID/ULID in binary form, you may not be able to distinguish between a ULID and a valid UUID of some other version if the bits happen to align

Alternative:

Store ULID in a UUID field and hope for the best. You may get problems from libraries or software that treat UUID strictly. For example, Postgres will gladly store an invalid UUID but MariaDB will give an error.

As for me, I abandoned the migration idea for now, but now I generate v7-compatible ULID. Basically, I generate UUIDv7 and store it in Base32, this will make at least newer IDs compatible for conversion (in my case old IDs become irrelevant in a year or so)