laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.21k stars 10.9k forks source link

Incorrect Sorting with UUID Data Type in MariaDB #51883

Closed Karem-sobhy closed 2 months ago

Karem-sobhy commented 3 months ago

Laravel Version

11.11.1

PHP Version

8.3.8

Database Driver & Version

MariaDB 11.4.2

Description

When using the new UUID data type in MariaDB, the sorting is incorrect due to byte swapping performed by MariaDB to optimize UUIDv1 storage and indexing.

Details:

MariaDB stores UUID in an index-friendly manner. A UUID of the form llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn is stored as: nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll

This format provides a sorting order assuming a UUIDv1 (node and timestamp) is used, where the node is followed by the timestamp. The key aspect is that the timestamps are sequential.

MariaDB does not perform this byte-swapping for UUID versions >= 6:

MariaDB starting with 10.10 Starting from MariaDB 10.10.6 and MariaDB 10.11.5, considering that UUIDv7 and other versions are designed around time ordering, UUIDs version >= 6 are now stored without byte-swapping. Additionally, UUIDs with version >= 8 and variant=0 are now considered invalid (as the standard expects).

Problem:

When Laravel uses Str::orderedUuid() to generate a new ID for a model and insert it, MariaDB sees the version as < 6 and inserts it swapped. Consequently, the UUID is not ordered and not index-friendly, even though the purpose of the ordered UUID function is to ensure order and End of the table insert (Index friendly insert).

Conclusion:

Steps To Reproduce

  1. in .env DB_CONNECTION=mariadb
  2. make model with migration that contains $table->uuid('id')->primary();
  3. use the HasUuids trait inside the model
  4. create some of that model and observe the wrong order that it's stored in the table
driesvints commented 2 months ago

Thanks @Karem-sobhy. Pinging you @staudenmeir as you were the author of https://github.com/laravel/framework/pull/50192. Seems UUID columns aren't compatible with Laravel? I think we'll need more extensive tests in our MariaDB test suite for this one.

github-actions[bot] commented 2 months ago

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

staudenmeir commented 2 months ago

Interesting issue. I don't order by UUID columns in my apps and so haven't run into it.

This issue makes the new UUID data type in MariaDB useless and not compatible with Laravel.

Not a very constructive statement. The issue is not that Laravel supports a native database data type. When you don't need your UUIDs to be sortable or already use v7 UUIDs, the native type is a great benefit. MariaDB added the type for a reason.

IMO, the best solution would be using v7 UUIDs instead the current of the v4 ordered UUIDs, but altering the HasUuids trait's default behavior is basically an unsolvable breaking change.

Instead, I would say that this needs to be addressed by MariaDB users in their applications:

driesvints commented 2 months ago

Thanks @staudenmeir. I do feel we need some documentation around this and would be grateful if anyone could provide a PR.

@Karem-sobhy please see @staudenmeir's reply around this topic.

Karem-sobhy commented 2 months ago

@staudenmeir using the old char(36) may be a fix but using the orderedUuid with mariadb is just bad for indexing and performance of the insertion and also non informed users maybe don't know all this because the documentation says clearly the uuid will be ordered which is not true when using mariadb default uuids

I can help with a pr but don't know what will be best action to do so can you help me to chose from:

driesvints commented 2 months ago

I honestly don't know. We can't make the breaking change and having a HasV7Uuid which should technically be the better default sits a bit ill with me. I just wouldn't change anything atm.

staudenmeir commented 2 months ago

Thanks for the PR @Karem-sobhy. IMO, the HasV7Uuid trait is the best solution to the MariaDB issue right now.

For Laravel 12, I think we should consider again to switch to UUID v7 by default. Just last week, there was a tweet about the performance gains from v7 on PostgreSQL.

@driesvints Do you remember why https://github.com/laravel/framework/pull/44210 got reverted in the end?

driesvints commented 2 months ago

Hey @staudenmeir. I guess it's mainly this remark from @tillkruss:

IMO it makes no sense to change the default Str::uuid() to v7, which will always reveal the timestamp (minor data leak),

Seems v7 has a data leak that's unwanted?

staudenmeir commented 2 months ago

Thanks @driesvints. I was thinking about "only" switching the HasUuids trait to UUID v7 while leaving Str::uuid() unchanged. This would improve performance (at least with the DBMS we've checked so far) but not introduce any data leaks.

When upgrading to Laravel 12, existing users of the HasUuids trait that rely on the sortability would have to override newUniqueId() and continue using Str::orderedUuid() to avoid the breaking change.

driesvints commented 2 months ago

@staudenmeir thanks for explaining. Right now I don't have the time to dig deeper into this matter. The best way forward is that you attempt a PR with whatever you want to try so Taylor can review it. Thanks 👍

Karem-sobhy commented 2 months ago

@staudenmeir exactly what I was thinking about when creating the PR we leave Str::Uuid as is and have Str:Uuid7 instead but like you i think that HasUuid trait should use uuid7 by default because that if uuid7 leaks date so ordered uuid4 leaks the same data and is not a standard uuid also.

hafezdivandari commented 2 months ago

@staudenmeir @Karem-sobhy I just saw PR #52029 with the new HasVersion7Uuids trait and I wonder how it's gonna solve the ordering issue? I mean assuming a model with HasUuid now switches to the new HasVersion7Uuids trait, the records will have wrong order because Str::uuid7() generates lower values than Str::orderedUuid right?

I mean this test is failing on MySQL / MariaDB:

public function testUuidOrder()
{
    Schema::create('orders', function (Blueprint $table) {
        $table->id();
        $table->uuid();
    });

    // we were using orderedUuid
    DB::table('orders')->insert(['uuid' => (string) Str::orderedUuid()]);

    // now switching to uuid7
    DB::table('orders')->insert(['uuid' => (string) Str::uuid7()]);

    // trying to get records in order, the actual value is [2, 1] and the test fails!
    $this->assertEquals([1, 2], DB::table('orders')->orderBy('uuid')->pluck('id')->toArray());
}

What am I missing here?

staudenmeir commented 2 months ago

@hafezdivandari The new trait can only be used for models that don't have any records yet. You can't switch to it.

hafezdivandari commented 2 months ago

@staudenmeir thank you for explaining, I was so confused.

Karem-sobhy commented 2 months ago

@hafezdivandari as @staudenmeir said The uuid7 is not compatible with the "ordered" uuid4 They can't be used interchangeabley but if you really want to you can change all you old model ids first You can get them all chunked and oredered by created at and assign a new id for them by the new Str::uuid7 method then you can use the new trait P.S you should be careful with the relationship have the old ordered uuid4 as the foreign key

hafezdivandari commented 2 months ago

Thanks @Karem-sobhy, actually I was looking for a better solution to be used here on upcoming Passport 13.x. However this model's sortability doesn't rely on it's ID and I think we can use new uuid7 instead!

Karem-sobhy commented 2 months ago

@hafezdivandari UUID7 is the standard ordered UUID implementation, making it more advantageous than the non-standard ordered UUID4. This is particularly beneficial when used with the new MariaDB UUID datatype, as it is stored as BIN(16) instead of CHAR(36). Although ordered UUID4 will provide similar advantages in terms of storage efficiency with MariaDB, it does not offer optimized ordering. Consequently, every insert will be out of order, adding a slight overhead, although not significant.

If this approach is applied to Passport, it won't significantly impact current users because the structure remains the same, and it is not used for ordering. However, new Passport users will benefit from optimized order insertion and the ability to sort by ID when using MariaDB.

staudenmeir commented 1 month ago

I created a PR for Laravel 12: #52433