mareek / UUIDNext

A fast and modern .NET library to generate UUID/GUID that are either sequential and database friendly (versions 7), name based (versions 5) or random (version 4).
BSD Zero Clause License
240 stars 13 forks source link

UUIDs are not sorted properly in a MSSQL DB at scale #2

Closed jeffward01 closed 1 year ago

jeffward01 commented 1 year ago

Context is here Link to code that can be reproduced

Using UUID version 7:

var values = Enumerable.Range(1, 10) .Select(i => { Thread.Sleep(TimeSpan.FromMilliseconds(10)); // UUIDNext uses millisecond-precision, let's sleep for a bit return $"({i}, '{Uuid.NewDatabaseFriendly():D}')"; }) .ToList();

Console.WriteLine($"{string.Join(", ", values)}");

Then check that ORDER BY sorts them as expected:

CREATE TABLE #testing_uuids ( ExpectedPosition int PRIMARY KEY, UUID uniqueidentifier NOT NULL );

INSERT INTO #testing_uuids VALUES (1, '01855978-effc-758f-8fc1-6b755429799c'), (2, '01855978-f00f-74c2-a6f0-8e8c5445ec9a'), (3, '01855978-f019-7104-bab4-79f068bc8150');

SELECT * FROM #testing_uuids ORDER BY UUID ASC;

Unfortunately, on MSSQL I got 1, 3, 2.

ByronAP commented 1 year ago

this is because sql server stores data using little-endian

jeffward01 commented 1 year ago

Thats very interesting, thanks for this! I did not realize that.

Will this result be problematic? Is there anything you suggest we do with to prevent this, or its not a big deal?

Thanks

peschkaj commented 1 year ago

It's even worse: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450

Depending on your ORM, you can create an extension to rearrange the UUID bytes on the way in and out, but you'll likely want a CLR function to also make it easy to search via TSQL.

ByronAP commented 1 year ago

It's problematic in the sense that this library doesn't account for the way sql server stores data. What you perceive and would assume would make an ordered index doesn't.

peschkaj commented 1 year ago

We have used base62 encoding, left padding, and CHAR(22) to work around the limitations of SQL Server. It's important to note that this is a SQL Server issue, not a UUID library issue.

ByronAP commented 1 year ago

Agreed it's not the libraries' fault

jeffward01 commented 1 year ago

We have used base62 encoding, left padding, and CHAR(22) to work around the limitations of SQL Server. It's important to note that this is a SQL Server issue, not a UUID library issue.

How can we be aware to correct this in our own implementations? Such as, many of us are C# developers, not SQL experts, perhaps we know MongoDB, Postgres, MSSQL, and can write some queries, and scripts, but beyond that its a bit of a black box.

What can we do to 'fix' this? Or ensure we use this library property in conjunction with MSSQL?

@ByronAP @peschkaj

ByronAP commented 1 year ago

I think the best way to do this is by using a bigint in the same or similar fashion as twittter snowflake.

Snowflake_ID Wikipedia

Snowflake Repository

Snowflake IdGen Repository

peschkaj commented 1 year ago

@jeffward01 your options are:

jeffward01 commented 1 year ago

Thanks for this! you guys are great! I appreciate all the help and direction, I will check out https://github.com/peschkaj/rustflakes and if it does not suit my needs (my needs are super super basic, just a performant UUID that I can use on SqlServer (sadly a requirement) and other databases.

I really appreciate you guys making this less of a 'black box'.

When I get my config's up and going, is there anything I can do to help you guys? I plan on writing a post about this very nuanced area and posting it on reddit or something (I don't have a blog)

@peschkaj - I just read the readme:

When you want time-based ordered IDs generated in many locations and sent to many locations. When you can't generate sequential identifiers yourself (Windows Azure SQL Database, I'm looking at you).

hahaha, thats great

mareek commented 1 year ago

Hello guys, Library author here. So I did some investigation and testing to implement a SQL Server friendly UUID Generator. Fortunately, the new standard introduces UUID Version 8 which basically a "Bring your own algorithm" version that only require the presence of the variant an version bits.

So I've created a new UUID V8 generator named UuidV8SqlServerGenerator (which is basically UuidV7Generator with some byte shuffling) that produces unique UUID that are naturally sorted on SQL Server. I've also added a parameter to the static method Uuid.NewDatabaseFriendly to specify the database in which the UUID will be stored so that the correct version is used depending on the sorting method of each database.