vanbukin / Uuids

Fast C# UUID implementation for .NET 6 & 7
MIT License
69 stars 3 forks source link

Isn't Uuid.NewTimeBased more of a COMB GUID? #7

Open dotneutron opened 10 months ago

dotneutron commented 10 months ago

It seems Uuid.NewTimeBased respects the structure of a UUIDv1 from RFC 4122 for the first 8 bytes (timestamp + version 1). The other 8 bytes are just randomly generated as in the case of a UUIDv4. As the clock sequence and node ID are missing, isn't this technically a variation of a COMB GUID? Instead of first 10 bytes of a UUIDv4 + 6 bytes of a timestamp (SQL Server specific), this is first 8 bytes of a UUIDv1 + last 8 bytes of a UUIDv4.

I was expecting this to be like UuidCreateSequential, except the internal representation is big-endian.

See also RT.Comb.

vanbukin commented 10 months ago

Isn't Uuid.NewTimeBased more of a COMB GUID?

The short answer is - maybe.

The long answer - things are a bit more complex. Initially, this package was developed to work with MySQL. This is because its behavior when working with System.Guid depends on the parameter of the connection string, which can have 7 (sic!) different values.

The main task I was solving is the creation of a structure that has

The generation algorithm is secondary and is not the main focus.

Technically, for MySQL, the value of the first 8 bytes is important. They need to form a monotonically increasing sequence. That's exactly why in the current implementation a unix timestamp is stored there, which ensures a monotonically increasing sequence. And all this works because MySQL uses binary(16) to work with uuid. Therefore, we can generate a Uuid using the appropriate method (Uuid.NewTimeBased()), then call the ToByteArray() method, passing its result as a parameter in the query.

The situation is different with PostgreSQL.

PostgreSQL has a built-in data type for handling uuid (which is called - uuid), and its driver (Npgsql) does not allow out-of-the-box writing of a byte array there. Only System.Guid can be used. At the same time, Npgsql relies on a specific layout of data within System.Guid.

For example, the code for generating a Uuidv7 might look approximately like this, packing its contents into a System.Guid, in such a way that Npgsql correctly conveys its value to PostgreSQL. Then, at the database level, it would look like a monotonically increasing sequence without time insertion drop-offs.

public static Guid Generate()
{
    const ushort bits48To63ResetVersionMask = 0xFFF;
    const ushort bits48To63SetVersionMask = 0x7000;
    const byte bits64To71ResetVersionMask = 0x3F;
    const byte bits64To71SetVersionMask = 0x80;

    Span<Guid> guidBuffer = stackalloc Guid[1];
    guidBuffer[0] = Guid.NewGuid();
    var buffer = MemoryMarshal.AsBytes(guidBuffer);
    var temp48To63 = (ushort) ((ushort) (BinaryPrimitives.ReadUInt16LittleEndian(buffer[6..]) & bits48To63ResetVersionMask) | bits48To63SetVersionMask);
    buffer[8] = (byte) ((byte) (buffer[8] & bits64To71ResetVersionMask) | bits64To71SetVersionMask);
    var unixTimeMilliseconds = (ulong) DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
    BinaryPrimitives.WriteUInt64BigEndian(buffer, (unixTimeMilliseconds << 16) | temp48To63);
    Span<byte> result = stackalloc byte[16];
    result[0] = buffer[3];
    result[1] = buffer[2];
    result[2] = buffer[1];
    result[3] = buffer[0];
    result[4] = buffer[5];
    result[5] = buffer[4];
    result[6] = buffer[7];
    result[7] = buffer[6];
    result[8] = buffer[8];
    result[9] = buffer[9];
    result[10] = buffer[10];
    result[11] = buffer[11];
    result[12] = buffer[12];
    result[13] = buffer[13];
    result[14] = buffer[14];
    result[15] = buffer[15];
    var guid = Unsafe.As<byte, Guid>(ref result.GetPinnableReference());
    return guid;
}

Why this order? Because Npgsql reads

However, storing uuid as binary(16) with such an algorithm would lead to performance drops, while Uuid.NewMySqlOptimized() would not. In this case, you can write the uuid as a byte array (when the column has binary(16) type).

vanbukin commented 10 months ago

In Microsoft SQL Server, the situation is different. Its driver for uuid uses the uniqueidentifier data type, just like Npgsql, it's designed to be used specifically with System.Guid. It takes Guid as is, but at the same time, it contains a small hint about the exact order in which SQL Server sorts bytes when working with this data type. https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Data.Common/src/System/Data/SQLTypes/SQLGuid.cs#L116

ReadOnlySpan rgiGuidOrder = new byte[SizeOfGuid] { 10, 11, 12, 13, 14, 15, 8, 9, 6, 7, 4, 5, 0, 1, 2, 3 };

There is a single article about this in the entire internet, and it can only be found in the web archive. https://web.archive.org/web/20120628234912/http://blogs.msdn.com/b/sqlprogrammability/archive/2006/11/06/how-are-guids-compared-in-sql-server-2005.aspx

In general, equality comparisons make a lot of sense with uniqueidentifier values. However, if you find yourself needing general ordering, then you might be looking at the wrong data type and should consider various integer types instead.

If, after careful thought, you decide to order on a uniqueidentifier column, you might be surprised by what you get back.

Given these two uniqueidentifier values:

@g1= '55666BEE-B3A0-4BF5-81A7-86FF976E763F' @g2 = '8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'

Many people think that @g1 is less than @g2, since '55666BEE' is certainly smaller than '8DD5BCA5'. However, this is not how SQL Server 2005 compares uniqueidentifier values.

The comparison is made by looking at byte "groups" right-to-left, and left-to-right within a byte "group". A byte group is what is delimited by the '-' character. More technically, we look at bytes {10 to 15} first, then {8-9}, then {6-7}, then {4-5}, and lastly {0 to 3}.

In this specific example, we would start by comparing '86FF976E763F' with '393AE6BBB849'. Immediately we see that @g2 is indeed greater than @g1.

Note that in .NET languages, Guid values have a different default sort order than in SQL Server. If you find the need to order an array or list of Guid using SQL Server comparison semantics, you can use an array or list of SqlGuid instead, which implements IComparable in a way which is consistent with SQL Server semantics.

Therefore, generating ordered (sequential) Guids for Microsoft SQL Server would look something like this (code is simplified, version and flag setting are not performed, and it will have a quite small fill factor - around less than 0.2% and won't require index rebuild).

public static Guid Generate()
{
    Span<Guid> guidBuffer = stackalloc Guid[1];
    guidBuffer[0] = Guid.NewGuid();
    var gBuffer = MemoryMarshal.AsBytes(guidBuffer);
    var unixTimeTicks = (ulong) (DateTimeOffset.UtcNow - DateTimeOffset.UnixEpoch).Ticks;
    Span<byte> tBuffer = stackalloc byte[8];
    BinaryPrimitives.WriteUInt64BigEndian(tBuffer, unixTimeTicks);
    Span<byte> result = stackalloc byte[16];
    result[0] = gBuffer[12];
    result[1] = gBuffer[13];
    result[2] = gBuffer[14];
    result[3] = gBuffer[15];
    result[4] = gBuffer[2];
    result[5] = gBuffer[3];
    result[6] = gBuffer[0];
    result[7] = gBuffer[1];
    result[8] = tBuffer[6];
    result[9] = tBuffer[7];
    result[10] = tBuffer[0];
    result[11] = tBuffer[1];
    result[12] = tBuffer[2];
    result[13] = tBuffer[3];
    result[14] = tBuffer[4];
    result[15] = tBuffer[5];
    var guid = Unsafe.As<byte, Guid>(ref result.GetPinnableReference());
    return guid;
}
vanbukin commented 10 months ago

That's exactly why the method is named as it is - NewMySqlOptimized(); Because MS SQL and PostgreSQL require

  1. to use System.Guid as a container
  2. a specific byte order within System.Guid.

The NewTimeBased() method could be used as a basis for generating such Guids (in theory, it would be enough to simply shuffle the bytes in a certain way), but I would likely use my own generation algorithms for each of the RDBMS (referring to MS SQL and PostgreSQL), which would immediately output a ready-to-use System.Guid.

vanbukin commented 10 months ago

@dotneutron I hope I've answered your questions. May I close the issue?