npgsql / efcore.pg

Entity Framework Core provider for PostgreSQL
PostgreSQL License
1.54k stars 226 forks source link

Serialization options for System.Text.Json support #1107

Open brendan-nobadthing opened 4 years ago

brendan-nobadthing commented 4 years ago

Hi,

I'm looking to serialize a model containing Nodatime types into a jsonb data field. Right now this serializes a whole bunch of individual fields from the nodatime type. I see a couple of days ago, jon skeet published a beta for system.text.json serlialisation support for nodatime: https://www.nuget.org/packages/NodaTime.Serialization.SystemTextJson

This looks like just the thing I'm looking for but I'm not sure how to wire this config into the serialization used by npgsql & EF core.

Any ideas / suggestions? Thanks!

roji commented 4 years ago

You're correct - in this initial version of JSON support there aren't any hooks for configuring the System.Text.Json serializer. This isn't trivial to do, but I agree it's definitely important.

brendan-nobadthing commented 4 years ago

Hi @roji Thanks for the reply.

After a bit of digging around, I managed to find a workaround. Once I realised that the actual serialization was being done by JsonHandler way down in NpgSql itself, I was able to refer to how the nodatime handlers and mappers are configured to come up with this:

var origJsonbMapping =
    NpgsqlConnection.GlobalTypeMapper.Mappings.Single(m => m.NpgsqlDbType == NpgsqlDbType.Jsonb);
NpgsqlConnection.GlobalTypeMapper.RemoveMapping(origJsonbMapping.PgTypeName);
NpgsqlConnection.GlobalTypeMapper.AddMapping(new NpgsqlTypeMappingBuilder
{
    PgTypeName = origJsonbMapping.PgTypeName,
    NpgsqlDbType = origJsonbMapping.NpgsqlDbType,
    DbTypes = origJsonbMapping.DbTypes,
    ClrTypes = origJsonbMapping.ClrTypes,
    InferredDbType = origJsonbMapping.InferredDbType,
    TypeHandlerFactory = new JsonbHandlerFactory(new JsonSerializerOptions()
        .ConfigureForNodaTime(DateTimeZoneProviders.Serialization))
}.Build());

Works so far for a simple "save stuff, and get the same values back" Integration test. Will report back here if I get any issues once applied to my project.

roji commented 4 years ago

@brendan-ssw great :) Note that this will change the ADO-level serialization options, so documents will be read and written with these options. However, EF Core also generates JSON property names when traversing into a JSON column in a LINQ query, and that's not taken care of by the above.

sitepodmatt commented 4 years ago

This is definitely a desired feature for those using their own Converters registered via JsonSerializerOptions.

@brendan-ssw solution works accepting the caveat about query EFFunctions, thanks

space-alien commented 4 years ago

Perhaps the Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime plugin should take a dependency on NodaTime.Serialization.SystemTextJson and configure the JSON Serializer for this scenario by default?

This might be a good interim step that could be released sooner than full configuration options for the JSON serializer.

roji commented 4 years ago

@space-alien I don't think that makes sense - not all users of the NodaTime plugin use System.Text.Json, so there shouldn't be a dependency there. More importantly, the NodaTime plugin wouldn't actually be involved in the serialization in any way, only in the configuration, which again doesn't seem right.

Fortunately there's at least a workaround as @brendan-ssw posted above, but other than that the provider simply needs to expose a way to configure the JSON serializer...

space-alien commented 4 years ago

My thinking was more the other way round - or maybe just back to front, depending on one's perspective..!😅

The NodaTime.Serialization.SystemTextJson package can be thought of as the NodaTime "default/official/recommended" config for System.Text.Json, just delivered as a separate package.

All users of json columns are implicitly relying on System.Text.Json, since that's the built-in JSON serializer in efcore.pg.

So, for a user of the NodaTime efcore.pg plugin, who maps a complex model property to a json column, the built-in NodaTime configuration for System.Text.Json is an appropriate and intuitive default.

If their complex property contains NodaTime types, it's highly likely they will want the JSON serializer to adopt the NodaTime default configuration. If this default configuration is not applied, it's all but certain they will need to apply serializer configuration manually, when this could probably have been avoided.

The Nodatime efcore.pg plugin does not take any responsibility for serialization in this setup. I don't think the plugin's dependency on NodaTime.Serialization.SystemTextJson would add anything unwarranted, since we're already reliant on both of that package's sub-dependencies, namely NodaTime and System.Text.Json.

Of course, this is a separate concern to exposing wider configuration options for the JSON serializer, and a user of the NodaTime plugin could still override any default configuration.

TL;DR: Without bringing in any new sub-dependencies, the Nodatime plugin could apply NodaTime "default" config to the built-in JSON serializer.

(I think! 😀)

roji commented 4 years ago

All users of json columns are implicitly relying on System.Text.Json, since that's the built-in JSON serializer in efcore.pg.

That's true, but System.Text.Json is part of the framework and not an additional external dependencies (since netcoreapp3.0). That's one good reason it exists within EFCore.PG and not as a plugin.

TL;DR: Without bringing in any new sub-dependencies, the Nodatime plugin could apply NodaTime "default" config to the built-in JSON serializer.

How would that work? Wouldn't the NodaTime plugin (Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime) need to depend on NodaTime.Serialization.SystemTextJson? If not, how do the methods of the latter get called?

space-alien commented 4 years ago

Ah, so for netcoreapp3.0 this would actually introduce a dependency on System.Text.Json through NodaTime.Serialization.SystemTextJson, as the latter only targets netstandard2.0?

If that's the case, would this be solved if NodaTime.Serialization.SystemTextJson added a target for netcoreapp3.0?

roji commented 4 years ago

I'm not sure I understand... When targeting netcoreapp3.0, there's no dependency (i.e. nuget package) needed to use System.Text.Json - it's in the box, just like System.Text. I thought you were proposing that Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime take a dependency on NodaTime.Serialization.SystemTextJson in order to configure Npgsql's JSON serialization options to use it...

space-alien commented 4 years ago

I thought you were proposing that Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime take a dependency on NodaTime.Serialization.SystemTextJson in order to configure Npgsql's JSON serialization options to use it...

Yes, I am!

I thought that this wouldn't add any other new dependencies because:

Since NodaTime.Serialization.SystemTextJson can be thought of an 'official' NodaTime package, I felt it would be reasonable to add it.

I'm not too hot on this targeting stuff so perhaps I have misunderstood your earlier comments. Would it not be the case that a user would either be targeting netstandard (in which case Npgsql brings in System.Text.Json), or netcoreapp3.0 (in which case it's in the box, as you say)? I'm sorry if I have misunderstood how this works.

roji commented 4 years ago

My problem is with making Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime depend on NodaTime.Serialization.SystemTextJson, since it wouldn't actually need that package or use it for anything except configuring JSON serialization options. Its current role is to change the mappings of of PostgreSQL date/time types (outside of JSON!) to NodaTime CLR types instead of the built-in BCL types - that's quite different and unrelated.

Also, there are other non-NodaTime needs to configuration System.Text.Json serialization, so something would have to be exposed in any case - once that happens, it doesn't seem like much to ask the user to take a dependency on NodaTime.Serialization.SystemTextJson themselves and do the proper setup.

I'm sorry if I have misunderstood how this works.

No worries at all! This is a good conversation.

space-alien commented 4 years ago

Yes, this makes sense! Thanks for taking the time to discuss this.

roji commented 4 years ago

Punting this out of 5.0, I'm hoping to do a big work cycle on cross-database JSON support, this should be part of that.

weitzhandler commented 3 years ago

Is there already a way to have the EF Core provider resolve the JsonSerializerOptions from the service provider? This issue is agnostic to NodaTime. It's a general EF Core JSON handling one.

IMHO GlobalTypeMapper should be deprecated and replaced with the options pattern, or at least some of it, so that it can be easily configured at app startup stage and used across the app with the DI container.

roji commented 3 years ago

@weitzhandler not all applications are ASP.NET applications, or even DI applications, so requiring the options pattern wouldn't really do. There are also internal performance considerations which favor a global/static approach (such as GlobalTypeMapper) over something that would use DI.

What actual issues do you see with calling a GlobalTypeMapper on application startup?

weitzhandler commented 3 years ago

Thanks for your explanation, convinced me.

mg90707 commented 3 years ago

For our application it would be useful to provide options, too. It would especially be nice if we could somehow query case insensitive for jsonb properties, but I don't know how feasible this is.

sreejith-ms commented 3 years ago

Any update on this?

In my case, all the domain entities polluted with JsonPropertyName even though it is an infrastructure concern. I have configured Text.Json with CamelCase policy, but this https://github.com/efcore/EFCore.NamingConventions/issues/65#issuecomment-776023203 suggests that the workaround is not feasible.

roji commented 3 years ago

Not yet. This is something I'd like to get around to, but haven't had the time.

sreejith-ms commented 3 years ago

@roji Is this something up-for-grabs/good-first-issue? Then I can take a look.

roji commented 3 years ago

Not entirely sure... The challenge is that there are two places where JSON field names are generated:

  1. When saving/loading documents (serialization). This is managed by System.Text.Json at the ADO.NET level of Npgsql.
  2. When generating query SQL which contains JSON field names (e.g. Where(x => x.SomeJson.Foo == 'bar'). EF Core would need to apply the naming convention itself in the SQL generation phase, probably.

These two layers need to be aligned around the naming convention - it's probably not trivial.

Note that this is really unrelated to EFCore.NamingConventions - that plugin is about table/column names, whereas here we're discussing fields names within JSON documents (which are themselves stored in a column).

sreejith-ms commented 3 years ago

@roji It looks not trivial at all. I'm curious about the solution, will this be a part of EntityTypeBuilder or a separate fluent style API that can be used by both EFCore and ADO.NET?

roji commented 3 years ago

I don't know yet - designing the solution is part of what needs to be done :) The complexity is unfortunately the main reason I haven't done it yet...

roji commented 3 years ago

Here's a sample for working around this in 6.0.0 (similar to this above):

var options = new JsonSerializerOptions
{
    // Customize based on needs...
};
NpgsqlConnection.GlobalTypeMapper.AddTypeResolverFactory(new JsonOverrideTypeHandlerResolverFactory(options));

await using var conn = new NpgsqlConnection("Host=localhost;Username=test;Password=test");
await conn.OpenAsync();

using var cmd = new NpgsqlCommand("SELECT $1::text", conn);
cmd.Parameters.Add(new()
{
    Value = new Blog
    {
        Title = "foo",
    }
});
Console.WriteLine(cmd.ExecuteScalar());

class JsonOverrideTypeHandlerResolverFactory : TypeHandlerResolverFactory
{
    private readonly JsonSerializerOptions _options;

    public JsonOverrideTypeHandlerResolverFactory(JsonSerializerOptions options)
        => _options = options;

    public override TypeHandlerResolver Create(NpgsqlConnector connector)
        => new JsonOverrideTypeHandlerResolver(connector, _options);

    public override string? GetDataTypeNameByClrType(Type clrType)
        => null;

    public override TypeMappingInfo? GetMappingByDataTypeName(string dataTypeName)
        => null;

    class JsonOverrideTypeHandlerResolver : TypeHandlerResolver
    {
        readonly JsonHandler _jsonbHandler;

        internal JsonOverrideTypeHandlerResolver(NpgsqlConnector connector, JsonSerializerOptions options)
            => _jsonbHandler ??= new JsonHandler(
                connector.DatabaseInfo.GetPostgresTypeByName("jsonb"),
                connector.TextEncoding,
                isJsonb: true,
                options);

        public override NpgsqlTypeHandler? ResolveByDataTypeName(string typeName)
            => typeName == "jsonb" ? _jsonbHandler : null;

        public override NpgsqlTypeHandler? ResolveByClrType(Type type)
            // You can add any user-defined CLR types which you want mapped to jsonb
            => type == typeof(JsonDocument) || type == typeof(Blog)
                ? _jsonbHandler
                : null;

        public override TypeMappingInfo? GetMappingByDataTypeName(string dataTypeName)
            => null; // Let the built-in resolver do this
    }
}

class Blog
{
    public string? Title { get; set; }
}
roji commented 3 years ago

Note: this depends on https://github.com/npgsql/npgsql/pull/4045, which was merged after 6.0.0. Use version 6.0.0-rtm-ci.20211017T140717 or higher from the vNext nuget feed.

llRandom commented 2 years ago

@roji thanks for a workaround. it's working fine when writing values to the database but when building a query like this

var joes = context.CustomerEntries
    .Where(e => e.Customer.Name == "Joe")
    .ToList();

it's converted to

SELECT c.""Id"", c.""Customer""
FROM ""CustomerEntries"" AS c
WHERE c.""Customer""->>'Name' = 'Joe'

instead of

SELECT c.""Id"", c.""Customer""
FROM ""CustomerEntries"" AS c
WHERE c.""Customer""->>'name' = 'Joe'

Is there a way to also apply camelCase when referencing json fields in linq query?

roji commented 2 years ago

@llRandom no - as written above this only works for storing and loading, but not for query SQL generation.

leonkosak commented 2 years ago

@roji, is this issue still on track for EF Core 7? I would like to store c# class object in jsonb using:

new JsonSerializerOptions
            { DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase }

Our use-case (c# class) has a lot of nullable props (100s) and 99% of them are always null. If we can eliminate null props, this would greatly reduce database size.

roji commented 2 years ago

This is unlikely to make it in for 7.0, it also depends on ongoing JSON work being done in EF Core itself.

But if you just need to specify the options for serialization, you can use the workaround I detailed above.

leonkosak commented 2 years ago

Thank you @roji. If I understand correctly, it is also possible to use FromSqlRaw method for insert/update like: ....FromSqlRaw("insert into sales values (1, '{name: "Alice", paying: true, tags: ["admin"]}');") where json string is from c# variable (JsonSerialization.Serialize with JsonSerializationOptions)

For read-only queries (SELECT), they would gor through regular irepository pattern. Would this work?

roji commented 2 years ago

FromSqlRaw is for executing queries, not inserts. But you have context.Database.ExecuteSqlInterpolatedAsync (and Raw) for this.

leonkosak commented 2 years ago

Yes, this is true. But my whole idea is viable to implement (that on read, npgsql would read successfully and correct "trimmed" jsonb data column (json without null props))?

roji commented 2 years ago

I think so - if you take care not to save null properties in your JSON document, deserialization should still work fine.

Note that you don't necessarily have to switch down to raw SQL for your inserts: see this workaround for configuring the JsonSerializationOptions.

wjrogers commented 1 year ago

Here's a slightly more robust work-around while we wait for full support in 8.0. It uses a plugin to apply the configured JsonSerializerOptions to both value (de-)serialization and member access query expressions.

https://www.nuget.org/packages/XO.EntityFrameworkCore.NpgsqlJsonSerializerOptions/

Edit: fixed a bug w/ default options that snuck in while I was tidying up for packaging

markushaslinger commented 1 year ago

@roji at the risk of annoying you: do you think this will make it into 8 (rc) as well?

In #2548 you said that you still plan to ship JSON support with rc1 (which is due in about three weeks). If serializer options support is available then I'll live with the verbose serialization of noda time types for the time being, since it will 'fix itself' in less than a month. However, if you don't see this in 8 I'd invest into applying one of the workarounds since my current project will be stuck on 8 for at least two years.

Don't want to stress you, but I'd appreciate feedback wrt time schedule to make a decision here, thanks.

roji commented 11 months ago

Note that #2548 was done for 8.0, meaning that you will now be able to use the regular EF Core ToJson() mechanism, using owned entities to map JSON documents. This is quite different from Npgsql's current POCO serialization support, where Npgsql and EF don't actually model what's inside the document - the EF ToJson() support provides full, rich modeling, and also allows translating arbitrary querying within the JSON document. This will now be the preferred way to map strongly-typed .NET POCOs to JSON documents in the database.

However, this support also doesn't allow specifying JsonSerializationOptions as-is; this is because EF is the one performing low-level serialization/deserialization, and does not use e.g. JsonSerializer. The issue tracking custom JSON serialization on the EF side is https://github.com/dotnet/efcore/issues/28043, but we need more feedback on exactly what kind of serialization options people are looking to tweak.

In any case, given that Npgsql's POCO JSON serialization will no longer be the best way to strongly-type model JSON documents, it's unlikely that I'll be able to work on implementing JSON serialization options for that. The workarounds published above should be good enough going forward.

So I'll put this issue in the backlog for now to gather feedback, but I'd suggest people take a look at the ToJson() feature for 8.0.

markushaslinger commented 11 months ago

Thanks for the update and managing to squeeze in #2548 for 8.0 @roji!

Just to clarify, given the following simple entity:

public class Student
{
    public int Id { get; set; }
    public required string Name { get; set; }
    public required List<ExamGrade> ExamGrades { get; set; }
}

public sealed class ExamGrade
{
    public Grade Grade { get; set; } // works fine as int
    public required string Subject { get; set; }
    public LocalDate Date { get; set; } // NodaTime type in owned entity
}

public enum Grade
{
    A,
   // ...
}

We currently have to set up something like this:

var student = modelBuilder.Entity<Student>();
// ...
student.OwnsMany(s => s.ExamGrades, builder =>
{
  var options = new JsonSerializerOptions();
  options = options.ConfigureForNodaTime(DateTimeZoneProviders.Tzdb);
  builder.ToJson();
  builder.Property(eg => eg.Date)
         .HasConversion( v => JsonSerializer.Serialize(v, options), 
                                 v => JsonSerializer.Deserialize<LocalDate>(v, options));
});

And live with the property being treated as string rather than date during queries (which for iso format usually works anyway)?

Or am I missing something so that we can specify the 'column type' as date? I added the rtm build for Npgsql.EntityFrameworkCore.PostgreSQL.NodaTime, but this is probably only for actual column mapped properties, not those of owned entities.

It would be similar for other custom types, I just picked NodaTime as a common source for 'primitive' entity properties.

roji commented 11 months ago

When you map something as an owned JSON entity, EF is fully aware of the types of properties stored inside, and knows how to convert them out from their e.g. string JSON serialization format to their actual PG data type. So if you inspect the JSON saved to the database, you'll indeed see an ISO8601 string representation for the local date; but if you reference that property in a query (e.g. context.Students.Where(s => s.Any(e => e.Date > something))), EF will generate SQL that converts that out (by casting that string to a `date).

This unlocks full querying inside JSON documents, with any LINQ operator that EF already supports. That's one of the advantages of the new ToJson() approach, compared to the Npgsql provider's current POCO approach to JSON.

That should generally obviate the need for specifying "serialization options"; in fact, changing anything in how the property is actually stored (i.e. ISO8601) would make it un-queryable, since the provider won't know how to convert it out to a date.

Hope that clarifies things...

markushaslinger commented 11 months ago

When you map something as an owned JSON entity, EF is fully aware of the types of properties stored inside, and knows how to convert them out from their e.g. string JSON serialization format to their actual PG data type. So if you inspect the JSON saved to the database, you'll indeed see an ISO8601 string representation for the local date; but if you reference that property in a query (e.g. context.Students.Where(s => s.Any(e => e.Date > something))), EF will generate SQL that converts that out (by casting that string to a `date).

When I only map it like this without specifying any conversion:

student.OwnsMany(s => s.ExamGrades).ToJson();

I get an Exception when inserting a student entity and calling SaveChanges:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.EntityFrameworkCore.Update.ModificationCommand.WriteJson(Utf8JsonWriter writer, Object navigationValue, IUpdateEntry parentEntry, IEntityType entityType, Nullable`1 ordinal, Boolean isCollection, Boolean isTopLevel)

Doing the same insert with the JSON Serializer set up works. I think I'm still missing a piece of the puzzle here. Will do some more testing/reading over the weekend.

roji commented 11 months ago

Can you please open a new issue with a minimal repro for that NullReferenceException?

roji commented 11 months ago

If it's at all possible to do this quickly, that would be great - 8.0 is being released in the next 2-3 days, and if there's a bug here, it would be great to be able to fix it before then.

markushaslinger commented 11 months ago

Can you please open a new issue with a minimal repro for that NullReferenceException?

Sure, I created a repo with the project. It is just a simple console app I threw together to check out the ToJson support, so pretty sure you'd have caught such a problem long ago and it's just a config issue on my end.

Do you want me to open the issue here or in the EFCore repo?

roji commented 11 months ago

If there's any chance you can test it out with SQL Server or SQLite, that would be great - if it fails on one of those, it would belong on the EF repo. Otherwise if that's not feasible, you can open it here and I'll do the investigating.

markushaslinger commented 11 months ago

If there's any chance you can test it out with SQL Server or SQLite, that would be great - if it fails on one of those, it would belong on the EF repo. Otherwise if that's not feasible, you can open it here and I'll do the investigating.

I did not find an updated NodaTime provider for SQLite, but I tried with SQL Server and got the same exception. So I opened https://github.com/dotnet/efcore/issues/32288 and pinged you there. Hope that helps, if you have any questions let me know,

roji commented 11 months ago

Thank you @markushaslinger! As this is an EF issue, this isn't something that we can fix for 8.0.0 (in fact, unfortunately the window for submitting patches for 8.0.1 just closed as well)... But we'll do our best to get to the bottom of it and hopefully patch it.

sitepodmatt commented 10 months ago

That should generally obviate the need for specifying "serialization options"; in fact, changing anything in how the property is actually stored (i.e. ISO8601) would make it un-queryable, since the provider won't know how to convert it out to a date.

This can certainly be understood in the case of a new greenfield application with just one new 8.0 EF client, but serialization options will still be important for those with existing data, existing non dot net clients, existing reports and views and so on. I for one have little control to re-shape data already in production. I think I'd be happy to opt out of deep querying ability of a.b.c = richType etc.. for now - specifically I need control of how keys are named, enums to strings converters, precision of datetime milliseconds via custom converter (long story - interop issue), and so on. It would be awesome though if we can specify custom revivers of sorts back to standard format to "This unlocks full querying inside JSON documents, with any LINQ operator that EF already supports". I presume however all the ways we've done this in EF 6.0 / Npgsql still work though in EF 8.0 using the old methods (via NpgsqlConnection.GlobalTypeMapper.AddTypeResolverFactory and wrapped JsonSerializationOptions) so perhaps we can just wait until ToJson() catches up

roji commented 10 months ago

@sitepodmatt if all you're looking for is storing some object graph as a JSON string in some column, you can just do that via a value converter, and tweak the serialization options exactly as you wish; that doesn't require any "JSON support" from EF.

Things become more interesting (and complicated) when you want EF itself to actually be aware of what's inside that column, either by supporting querying into it, partial updating (patching), etc. At that point it's simply not possible to "just have JSON serialization options work", since EF e.g. has to generate SQL that queries into the document. In other words, it's very unlikely that ToJson() will ever fully support JSON serialization options.

chrisbbe commented 10 months ago

Storing some object graph as a JSON string in a JSONB column is exactly what we have done in EF Core before V8, using a custom implementation of TypeHandlerResolverFactory as described in https://github.com/npgsql/efcore.pg/issues/1107. TypeHandlerResolverFactory seems to have been removed in Npgsql 8. Moving to an EF Core ValueConverter seems reasonable, my first try fails as the ValueConverter/Npgsql writes the JSON as bytea instead of jsonb. How can we 'hint' to Npgsql to use jsonb?

MessageText: column "data" is of type jsonb but expression is of type bytea

public sealed class MyValueConverter : ValueConverter<Dictionary<string, object?>, byte[]>
{
    public MyValueConverter() : base(
        status =>
            JsonSerializer.SerializeToUtf8Bytes(status, (JsonSerializerOptions?)null),
        value =>
            JsonSerializer.Deserialize<Dictionary<string, object?>>(value, (JsonSerializerOptions?)null) ??
            new Dictionary<string, object?>(StringComparer.OrdinalIgnoreCase))
    {
    }
}
roji commented 10 months ago

@chrisbbe you need to serialize to a simple string, not to UTF8 binary data.