rmalayter / ulid-mssql

Implementation of ULID generator For Microsoft SQL Server
Apache License 2.0
46 stars 4 forks source link

Crockford Encoding and Byte Order Troubles #2

Open WorldMaker opened 3 years ago

WorldMaker commented 3 years ago

Summary: [dbo].[ulidStr]() is generating incorrect ULIDs which seems to imply that there is a problem in your Crockford encoding. Additionally, I also found some issues in casting uniqueidentifier to binary(16).

Calling the [dbo].[ulidStr]() function just recently returned 05WHN5W2WHFXQM5W1PY3EN9XNR, which according to the NUlid library in C# Interactive is giving me a timestamp for [4/19/2175 5:47:10 PM +00:00], which is "weirdly close" but also more than a hundred years wrong to my system time (I'm using a local DB for testing). The time component from casting the uniqueidentifier to binary substring seems to be the correct portion of the uniqueidentifier which leads me to believe it is an error in Crockford encoding.

I discovered this in testing if I can round-trip the uniqueidentifier ULIDs through a version of [dbo].[ulidStr](@guid) that takes the uniqueidentifier instead of calling [dbo].[ulid]() to generate a new one. The other thing I discovered was the random bytes were even more scrambled than the timestamp. That lead me to discovering that CAST(uniqueidentifier as binary(16)) appears to include a "little endian" issue with the first three groups of the uniqueidentifier bytes getting swapped in binary order versus GUID string order. The substring for the random section of the uniqueidentifier needs to be more complex to take that into account to round-trip them properly.

rmalayter commented 1 year ago

I am sorry for the massively delayed response to this issue; I never got an email notification from Github and stumbled across this. Can you produce a test case script which fails in T-SQL so I can better understand exactly what is happening? The Crockford encoding function used in this library has been tested in another production system for many years so I am trying to get a handle on the potential error case.

Note that this implementation does intentionally use MSSQL little-endian byte-order sorting when converting to/from UUID variants. That is a quirk that we can't avoid if we want the UUID form to be "B-Tree friendly" and performant on MSSQL.

This library has been in production at my employer for many years without issue, although the environment is all-MSSQL so interop issues with other implementations of ULID would likely not be uncovered.

WorldMaker commented 1 year ago

Same with GitHub email notifications in my case as well, it seems.

As far as I remember of what I was trying to do at the time, it was a very simply modified ulidStr implementation:

ALTER FUNCTION [dbo].[ulidStr] (@ulid uniqueidentifier)
RETURNS VARCHAR(100)
AS
BEGIN
    DECLARE @temp BINARY (16)

    SET @temp = CAST(@ulid AS BINARY (16))

    RETURN [dbo].[base32CrockfordEnc](SUBSTRING(@temp, 11, 6) + SUBSTRING(@temp, 1, 10), 0)
END
GO

Calling that Select ulidStr(someDbId) from someTable and even something like Select ulid() as NewId, ulidStr(NewId) was returning badly Crockford-encoded timestamps. (In the example I gave it looks like 05… rather than 01… expected as the start of that string. It seemed like some sort of carry issue in the encoder function, but I didn't dig very deep into debugging it because I had working round-trips in my C# implementation and this [ulidStr] implementation was just for a nice-to-have report view. (The C#-first focus may have exacerbated the little-endian issues in the random section of the ULIDs that I was seeing, but I do know that I was round-tripping the timestamps just fine with ones made from the SQL ulid() function here.)