npgsql / efcore.pg

Entity Framework Core provider for PostgreSQL
PostgreSQL License
1.49k stars 215 forks source link

Support LINQ to JSONPATH querying #2570

Open roji opened 1 year ago

roji commented 1 year ago

It seems that JSONPATH support is very limited in other databases, and that PostgreSQL is the only one to support complex patterns, e.g.:

ctx.Entity.Where(e => e.JsonThing.Track.Segments.Any(s => s.HR > 130)
$.track.segments[*].HR ? (@ > 130)

See https://github.com/dotnet/efcore/issues/28616#issuecomment-1320956002 for more details.

We could set up a JSONPATH translation pipeline in EFCore.PG to support advanced querying inside JSON. If other databases develop this ability, we could move that to relational.

/cc @maumar @ajcvickers

Note #28824 which would provide raw JSONPATH string querying and #2669 which would translate JSONPATH APIs in System.Text.Json (if and when these are introduced)

hahn-kev commented 1 year ago

This would be awesome. But I'd also love support for calling jsonb_path_exists and providing my own path, that way if the translation doesn't work for my use case I can fallback to json path and write it myself without having to resort to writing raw sql.

roji commented 1 year ago

@hahn-kev yeah, we definitely need raw JSONPATH APIs as well; this has been discussed. Opened #2668 to make sure this is tracked.

NinoFloris commented 1 year ago

Fyi jsonpath does come with some constant cost perf penalties (due to the parsing and translating in pg), indices also won't be applied unless the entire path expression is a constant so that's a hard cliff to avoid too.

roji commented 1 year ago

jsonpath does come with some constant cost perf penalties (due to the parsing and translating in pg)

Sure, but shouldn't that be taken care of by prepared statements as usual? I.e. is there anything special here compared to just normal SQL queries (beyond possibly a slightly higher constant perf cost)?

Fyi jsonpath does come with some constant cost perf penalties (due to the parsing and translating in pg), indices also won't be applied unless the entire path expression is a constant so that's a hard cliff to avoid too.

Yeah... The alternative to generating a JSONPATH query is to convert the JSON collection to a rowset (with jsonb_to_recordset) and then just use use regular SQL over that (BTW this is currently what I'm implementing in the EF JSON support - that should carry across to PG pretty transparently). Such a query might be a bit better, since it can be more properly parameterized: rather than concatenating some number into the JSONPATH string, you can e.g. use the parameter placeholder directly in SQL as usual. I'm guessing indexes wouldn't work after jsonb_to_recordset anyway, so I'm not sure this all matters.

What do you think?

roji commented 8 months ago

Note that full LINQ querying has been implemented for version 8.0 of the provider, when using the newly supported EF ToJson() mapping (via owned entities). That approach uses jsonb_to_recordset to transform a JSON array into a relational table, and then uses the regular SQL mechanisms to query that table.

So while the need to support JSONPATH querying is now much lower, it may still make sense to convert certain complex queries over JSON to JSONPATH syntax, where that's possible. Beyond simplifying, this could be a sort of optimization in allowing an expression index to be defined for a particular JSONPATH expression - that's not likely to be possible with the complex SQL queries we produce now.

roji commented 2 months ago

Note that while we can definitely "simplify" at least certain translations to JSONPATH (instead of the current jsonb_to_recordset), the disadvantage is that if there's a parameter, we need to concatenate it into the JSONPATH string, which isn't great (and would presumably also prevent any sort of PG optimization, assuming that exists). If PG's JSONPATH supported some sort of parameterization, this would be more interesting.

(discussed with @NinoFloris)