Closed tparvi closed 1 month ago
@tparvi thanks for posting this; this question actually came up several times in some other contexts as well, and I've been meaning to look into it.
I do think there's probably something here, and am reopening to consider re-introducing some form of client-side sequential GUID generation for better perf. I've done a very quick test using the built-in EF SequentialGuidValueGenerator, following the steps in the above blog post; it did not improve performance in any way (i.e. the number of buffers was the same). I don't remember the details right now, but SequentialGuidValueGenerator was specifically crafted to match SQL Server sequential GUID generation, and so probably doesn't make sense for PG. But we should try to use a .NET UUIDv7 generator instead (e.g. based on the above-suggested UUIDNext), and at that point we should be able to get the same performance benefits as in the post.
If anyone wants to take a stab at this, please do!
In the meantime, remember that you can simply do server-side UUID generation; this should work just as well, except for scenarios where you're inserting both principal and dependent in the same SaveChanges, at which point an extra roundtrip is needed to fetch the principal's UUID (in order to send it as the foreign key of the dependent).
IMPORTANT: https://github.com/dotnet/efcore/issues/30753 seems to indicate that for SQL Server, client-generation of GUIDs may not be ideal for perf in any case. I'm not sure if that would apply to PG as well (would need to investigate), but users always have the option of installing the pg_uuidv7 extension and setting up uuid_generate_v7()
as the column default value (for server-side UUIDv7 generation).
There has been some buzz around this in recent months.
Benchmarks: https://ardentperf.com/2024/02/03/uuid-benchmark-war/
Lingering v7 support patch: https://commitfest.postgresql.org/47/4388/
Thanks @ProTip, that's useful!
We could take a dependency on https://github.com/stevesimmons/uuid7-csharp (thanks @Brar), or https://github.com/mareek/UUIDNext. https://github.com/dotnet/runtime/issues/88290 tracks adding UUIDv7 generation in .NET itself.
One possible thing to keep in mind is that UUIDv7 isn't yet a finalized standard (still in draft). But if the generation functions make it into PG itself, that should be good enough for us too.
In the meantime, anybody who wants UUIDv7 can easily wrap one of the above .NET libraries with an EF value generator.
dotnet/runtime#88290 tracks adding UUIDv7 generation in .NET itself.
This has been superseeded by https://github.com/dotnet/runtime/issues/103658 which was completed recently. .NET 9 will contain Guid.CreateVersion7()
.
Thanks @cremor - that's indeed perfect for our needs. We should switch to this for 9.
Will Npgsql.EntityFrameworkCore.PostgreSQL v9 actually target/require .NET 9? It looks like EF Core 9 will only target .NET 8 (just like EF Core 7 only targeted .NET 6).
EF 9 (and therefore EFCore.PG 9 as well) will indeed target net8.0, not net9.0; this means we can't yet use the new BCL API for generating UUIDv7. So we can either:
Is someone interesting in tacking (1), possibly @Timovzl?
@roji How strong is the need to avoid changes between 9 and 10? The reason I ask is that it is very possible that .NET 10 will see improvements to the UUIDv7 generator, especially to make it monotonic intra-millisecond. In the last paragraph of his comment, Tanner Gooding states:
These options can help guarantee uniqueness for a given UUID generator, but they are more advanced and were considered out of scope for the BCL to provide in .NET 9 given the limited timeframe before we lock down for RC1.
Under the same issue, see also my own recommendation for the implementation. Personally, I'm especially concerned with intra-millisecond monotonicity and (limited) resilience to clock adjustments, both of which I'll be pushing for.
Does this affect the direction the EF team would like to go in for now?
How strong is the need to avoid changes between 9 and 10?
Let's put it this way... If EFCore.PG targeted net9.0 today, I'd just use the new .NET API; if it then changed between 9 and 10, that would obviously affect the GUIDs produced by EFCore.PG. If I understand things correctly, adding intra-millisecond monotonicity in 10 wouldn't "invalidate" or "obsolete" the GUIDs produced by the 9 implementation - it would simply improve their quality.
Or put another way, it's probably better to start producing whatever .NET 9 produces in EFCore.PG 9, rather than continuing to produce random GUIDs as currently, no?
@roji I've just done something like this for EFCore.Jet. See CirrusRedOrg/EntityFrameworkCore.Jet#254
Used the original SequentialGuidValueGenerator
together with the stuff they did for CreateVersion7
at https://github.com/tannergooding/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Guid.cs#L304
It looks right and I finally have the test Can_use_sequential_GUID_end_to_end_async
working
Thanks @ChrisJollyAU! Is this copy-pasted from the runtime code, or your own implementation?
A bit of both. If you look at the CreateVersion7
function it actually references some private readonly variables in an internal struct
(_a
,_b
,_c
,_d
). Since I just have bytes (rather than the int
, short
etc that the internal variables they use).
You can see for _c
and _d
the code is basically the same. With _c
the main difference is I query the 2 bytes from the byte array, do the modification like they do, then write it back to my array. Compared to them having _c
in the struct already as a short
than can be used directly
The unix timestamp part should end up the same but just done with a byte array (versus an int
and short
in a struct
- same length 48 bits = 6 bytes)
The top section mostly deals with the counter to create the timestamp. Bascially comes from the original SequentialGuidValueGenerator
in EFCore.
By the looks of it you do need the _counter
. Its later used to create the timestamp needed. You don't get a sequential guid otherwise
I have tried setting up a value generator for the above mentioned test like
public class UUIDv7Generator : ValueGenerator
{
protected override object NextValue(EntityEntry entry) => Guid.CreateVersion7(DateTimeOffset.UtcNow);
public override bool GeneratesTemporaryValues { get; }
}
And adding this to the context (BronieContext
)
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<Pegasus>().Property(e => e.Id).HasValueGenerator<UUIDv7Generator>().ValueGeneratedOnAdd();
}
As the tests are running on .Net 9 this is available. Unfortunately it didn't produce fully sequential
Nice work, @ChrisJollyAU.
I just wanted to clarify that the need for the generator to be monotonic (intra-millisecond) is not being addressed by the initial .NET 9 implementation of UUIDv7. While it is possible to add logic on top of that, that's equivalent to rolling a custom implementation.
Do we prefer a custom implementation, or the .NET 9 implementation with its not-entirely-sequential nature?
Do we prefer a custom implementation, or the .NET 9 implementation with its not-entirely-sequential nature?
@Timovzl as I wrote above, if we targeted net9.0 today, I think we'd almost certainly use the .NET API - unless you think intra-millisecond is particulaly important in the context of PG, justifying a diverging implementation; I'd just rather not have to maintain this kind of thing within an EF provider. Similarly, in 10 we're almost sure to target net10.0, at which point we'd switch to the runtime implementation for the same reason.
So I'm leaning towards just using whatever .NET 9 does, and then switching to the runtime API next year. But I'm open to a custom implementation if you really believe that's important, and if there's a good chance that would also go into the runtime soon, allowing us to once again remove our custom implementation at some point.
I guess I'll defer to you guys overall - a PR for this would also be very helpful, as there's tons of other stuff happening at the moment.
I'll have to reason out loud and get some input on this, as I'm not a user of the feature. I like to have my IDs on construction. :smile:
Starting from the ideal of not having to maintain an ID generator in an EF provider, would the .NET 9 implementation suffice? The latest discussions color it unlikely that .NET 10 will see built-in intra-timestamp monotonicity either, so we'd have to be prepared to live without it.
With timestamp-based IDs that are shuffled intra-millisecond, all insertions will generally go into the last or second-to-last page, because we're always inserting something at least close to the greatest existing ID. We would have most of our performance gains.
Points of attention:
If you Add
or AddRange
10 invoice lines, under the assumption of a sequential ID generator, you'd expect them to receive IDs that match the input order. This would be missing. How bad is that? To me, it is unacceptable, but I'm a stickler for this sort of least astonishment.
If you Add or AddRange 10 invoice lines, under the assumption of a sequential ID generator, you'd expect them to receive IDs that match the input order. This would be missing. How bad is that? To me, it is unacceptable, but I'm a stickler for this sort of least astonishment.
I don't think that's a valid assumption; the sequential nature of the GUIDs we'd produce here is for optimizing indexing in PostgreSQL, and not any sort of guarantee as to the contents of the actual GUID contents as seen by the user. Users should generally not care about the GUID contents (and in any case, nobody has complained so far about about the totally random GUIDs we're currently producing).
Other than that, it sounds like we're fine going with the current .NET 9 implementation? If so, does either one of you want to submit a PR?
I've just been playing around with a couple of things. Wrote myself a test to see how my function compares to the .Net 9 function
public void mynet9guidtest()
{
DateTimeOffset dtoNow = DateTimeOffset.UtcNow;
Guid net9internal = Guid.CreateVersion7(dtoNow);
Guid custom = Next(dtoNow);
var bytenet9 = net9internal.ToByteArray().AsSpan(0, 6);
var bytecustom = custom.ToByteArray().AsSpan(0,6);
Assert.Equal(bytenet9,bytecustom);
Assert.Equal(net9internal.Version,custom.Version);
Assert.Equal(net9internal.Variant,custom.Variant);
}
Under the assumption that given the same timestamp, the timestamp part should be the same and the version and variant as well. Main thing was the order of bytes for the timestamp part was a bit off. Easy fix though. Version is correct. Variant is weird. Feels like I'm looking at Schrodingers Cat. Run the test normally as release. Passes every time. Run in Debug and it fails.
Any thoughts on going with the standard .Net 9 implementation as default but have an option to enable the intra-millisecond functionality?
While you can write tests like Can_use_sequential_GUID_end_to_end_async
and have the above scenario with adding the 10 invoice lines, but outside of that and being somewhat pedantic about the order, is there much real world expectation (or wish) that efcore.pg produces perfectly sequential guid's that align with the order that it was created
@roji I'm looking into creating a PR.
The main challenge seems to be that there is a ton of repetition. NpgsqlModelBuilderExtensions
, NpgsqlPropertyExtensions
, NpgsqlModelExtensions
, NpgsqlAnnotationNames
, and NpgsqlPropertyBuilderExtensions
all need to be changed. Moreover, unrelated methods like UseSerialColumns
explicitly disable specific pieces of competing functionality:
property.SetHiLoSequenceName(null);
property.SetHiLoSequenceSchema(null);
All such places need to account for the existence of the new option, since there no abstraction was used to disable everything before enabling the intended thing. This will take some time, and require careful review. :wink:
@Timovzl Just pushed an update to my PR on EFCore.Jet which fixes the timestamp byte order
If you use the Next
function in the equentialGuidEndToEndTest
that is the one without the counter for intra-millisecond perfectly sequential Guid
aka equivalent to the .Net 9 version.
Test above it passes.
Worked out what was going on with the variant. The .Version
property reads it as a 4 bit value and doesn't mask out the "don't care" bits. See table at https://datatracker.ietf.org/doc/html/rfc9562#variant_field . In the end it was only differeing in the "don't care" bits
Sounds like you have some other stuff disabled for Guid
s. All I was going to do was basically the same as EFCore.Jet: Add the new sequential guid value generator and pdate the FindForType
in NpgsqlValueGeneratorSelector
to use the new generator
All such places need to account for the existence of the new option [...]
I think you may be going in the wrong direction here. Take a look at the SQL Server provider: it uses SequentialGuidValueGenerator by default (which generates SQL Server-specific sequential GUIDs, not UUIDv7) - is there any reason not to do the same, and add a Uuid7ValueGenerator which would be used by default? Otherwise what design do you have in mind, what kind of option?
@roji That's the same as where I was going. See the PR just created
@roji Ahh, you simply wanted to replace the default. Yeah, that's a lot easier. 😆
@ChrisJollyAU, let's swap notes. I did a pure copy of the .NET 9 code, with a minimal change where I reinterpret the Guid
as a custom GuidDoppleganger
struct in order to be able to reach the private fields. Needing the struct is mildly convoluted, but it virtually keeps us from having to change/rewrite the method, staying close to the source material instead. Interesting trade-off.
Gotta go now. Will share code next Monday.
Thanks - @Timovzl can you please take a look at #3249?
In the past there have been few issues related to guid generation:
The issue https://github.com/npgsql/efcore.pg/issues/105 says We're dropping sequential Guid generation because PostgreSQL doesn't do clustered indices
The issue https://github.com/npgsql/efcore.pg/issues/2414 is closed because there really wasn't any proof/documentation provided that using sequential guids is beneficial
I ran into the following article that compared random uuids vs sequential and found an issue with random uuids: https://www.cybertec-postgresql.com/en/unexpected-downsides-of-uuid-keys-in-postgresql/
I guess my question has multiple parts
a) Does that article have any merit? I am not postgres expert so I have no way of knowing is this is a real issue or can I just keep using
Guid.NewGuid()
b) We are using Guid as primary key and generating it withGuid.NewGuid
. There is also thisGuidValueGenerator
but documentation says it can only be used to generate non key values: https://www.npgsql.org/efcore/modeling/generated-properties.html?tabs=13%2Cefcore5#guiduuid-generation Is there a way to use that value generator directly likevar id = GuidValueGenerator().Generate()
. We need to generate the id client side. c) If sequential uuids would be beneficial are there any recommendations on how to generate them? There are of course loads of libraries that can generate e.g. v7 uuid like https://github.com/mareek/UUIDNext