npgsql / efcore.pg

Entity Framework Core provider for PostgreSQL
PostgreSQL License
1.52k stars 223 forks source link

Support for json_array_elements #2364

Open ahanusa opened 2 years ago

ahanusa commented 2 years ago

Given the following table Things:

id: integer value: jsonb

With the following data:

id: 1, value: [1, 2, 3] id: 2, value: [2, 4, 5] id: 3, value: [0, 1]

We would like to be able to invoke the following function:

            var results = context.Things.Where(thing =>
                thing.Value.EnumerateArray().Any(v => v.GetInt32() >= 2));

And have it return the following records:

id: 1, value: [1, 2, 3] id: 2, value: [2, 4, 5]

This is a mere sample, but we are after additional functionality that EnumerateArray support would unlock for us.

Please consider adding support for JsonElement.EnumerateArray so that EF queries can be transpiled in this fashion.

Alternatively, does anyone have a suggestion as to how I could get the above query to work in a different fashion?

Thanks much!

roji commented 2 years ago

Originally posted on stackoverflow. The above specifically can simply be performed via EF.Functions.JsonContains - no need for json_array_elements or similar.

Of course, there are other scenarios where json_array_elements (and/or json_array_elements_text) could be useful. The challenge with that is that it would need to be a set-returning function.

Another approach here, at least in theory, would be to convert the JSON array into a regular array, instead of to a recordset. Surprisingly, this isn't straightforward (see e.g. this discussion).

roji commented 2 years ago

@ahanusa note that a workaround here is to have a regular PG array instead of a JSON array. Regular arrays are generally better supported than JSON arrays.

ahanusa commented 2 years ago

@roji , I think you may have misinterpreted my query above. We need the ability to see if any of the elements in an array are greater than or less than a value. Can you do that with JsonContains? I'm not seeing how as it returns a bool, but please let me know how if I'm missing something?

ahanusa commented 2 years ago

You might be confused because the original query in the SO thread was a bit different. That solution solved one problem for us, but we need the ability to check array contents for further decisions to be made. Thanks again for your help on this!

roji commented 2 years ago

Ah no - for greater than you'd indeed need something else. Have you considered just having a regular PG array instead of a JSON array?

ahanusa commented 2 years ago

we were hoping to allow our clients to save any valid json, whether it be of type number, string, true, false, object, or array. Saving them all to the same field greatly simplifies our architecture, but now I'm seeing that the statement is becoming less valid, :(

I'll look into PG array. Thanks again!

roji commented 2 years ago

I'm not completely sure whether what you want to do will work with a regular array either - I'd check that before investing too much work.

ahanusa commented 2 years ago

Yeah, I think a better approach is disallow the storage of json arrays, and store each array value in it's own record. This will allow our .Any() evaluations to work properly. Thanks for the reassurance in not pursuing the PG array route.

ahanusa commented 2 years ago

I just tried the above the approach and it is not ideal. What I now have to do is group like Thing values, assemble a json array, remove all like Things, and replace them with a single Thing with the Value property set to the array. It works, but it's gross. We have to do this because we can't break the client contracts. But, this is a "me" problem, lol.

That said, what is the the likelihood of support for json_array_elements being supported in the future?

roji commented 2 years ago

Yeah, I think a better approach is disallow the storage of json arrays, and store each array value in it's own record. This will allow our .Any() evaluations to work properly. Thanks for the reassurance in not pursuing the PG array route.

To be clear, I do think using a PG array should work. For example, your original request above can be expressed via:

WHERE 2 < ANY (array_column)

That pattern specifically isn't currently generated by the EF provider, but it could be (and would probably be much easier than supporting json_array_elements).

To summarize, I'd get a good idea of exactly what it is you want to do, and then see if that's possible with PG arrays - in SQL. If you find small gaps with the EF provider (such as the above), you can open issues and I'll probably take care of those quite quickly.

ahanusa commented 2 years ago

I don't expect a reply to this, I am posting this in case someone else is facing a similar design issue. Our software supports filtering and in-place editing of values in a grid. Each property has a type, and can be filtered as such. Each property type is associated with different filter operations. This can be seen below:

img

Each value is stored as a record in a single table in a JsonElement field. It is very similar to the Thing table above. This enables us to allow the client to send a single json value, whether that is a number, object, array, true, false, string, or number, and store it in the database with zero effort on our part. The incoming request value is of JsonElement, and it is set properly by the ASP.NET model binder, and then stored directly to the table with EF, and this all happens for free.

Things become interesting when we allow dynamic filtering. For proper filtering to work, we need to dynamically build WHERE clauses, and then perform the correct JsonElement.GetXXX methods so that we can perform proper queries. This has actually proven to be a very easy feat. Below is a sample of a dynamic filter for a BETWEEN decimal operation:

return base.Query.Where(a => a.PropertyValues.Any(pv =>
    pv.PropertyName == this.Property.PropertyName &&
    pv.Value.GetDecimal() >= this._leftOperand &&
    pv.Value.GetDecimal() <= this._rightOperand));

Our entire infrastructure has worked great up until we started adding filters for our properties of type array. Common sense would dictate a dynamic query that looks like the following (assume a query that allows retrieving any property values whose array contains a number greater than the user specified value):

return base.Query.Where(a => a.PropertyValues.Any(pv =>
    pv.PropertyName == this.Property.PropertyName &&
    pv.Value.EnumerateArray().Any(v => v.GetInt32() >= 2));  // 2 hard-coded for brevity

But this is where things fall apart for us unfortunately.

So while we could handle json arrays differently than all other json elements, it is a less than ideal situation for us.

Hopefully this helps someone else facing the same design issue.

roji commented 2 years ago

Note: infrastructure is probably currently lacking in EF Core to translate to a table-valued TableValuedFunctionExpression.

adamal commented 5 months ago

I am working on a very similar use case as the one @ahanusa described. The latest requirement is to have query for a form of keyword/tag field where we need to determine if the the filter's "tags" intersect the tags in a json array at a given path. Luckily for us, calls to EF.Functions.JsonContains can be used here by OR'ing them together dynamically for each filter value.

But reading this thread, it does worry me that changing the operator from = (as implicit in contains) to some other comparison is not possible without reverting back to full SQL.

Of course, there are other scenarios where json_array_elements (and/or json_array_elements_text) could be useful. The challenge with that is that it would need to be a set-returning function. Another approach here, at least in theory, would be to convert the JSON array into a regular array, instead of to a recordset. Surprisingly, this isn't straightforward (see e.g. this discussion).

A third approach to solve this issue would be to have EF.Functions.'s for JsonPath queries that would translate to @@ and @?. While this would most likely lose type safety for the corresponding WHERE-clause, the rest of the query including other WHERE-clauses could be left typed.

roji commented 5 months ago

For anyone looking at this, be sure to read up on the new primitive collection support introduced in EF Core 8.0. On the PostgreSQL provider, this is implemented by using arrays (not JSON arrays), and covers a lot of the intended use cases mentioned - you can have array properties, parameters or inline constants, and use arbitrary LINQ operators over them.

adamal commented 5 months ago

the new primitive collection support introduced in EF Core 8.0

I found this blog post enlightening on the subject: https://devblogs.microsoft.com/dotnet/announcing-ef8-preview-4/

But unless I am mistaken, the "primitive collection" concept only adresses model collections (that may be stored in json or native Postgres arrays in the Postgres case). I.e. that are recognized as collections on the model level.

It does not directly address the main issues discussed in the above: Querying array values nested in unstable json schema.

roji commented 5 months ago

EF generally doesn't (currently) provide great support for working with unstable, unmodeled JSON data. The PG provider specifically has its older mechanism which works better with unstable JSON data - and json_array_elements could make sense there.

roji commented 5 months ago

@adamal to expand a bit, it isn't clear to me that what people are asking for in this issue is support for unstable JSON, necessarily - or just support for querying their (stable) JSON documents; especially since this issue was opened before EF Core 8.0 came out with its rich support for primitive collections.

Now, one point I made was that the PG provider currently supports EF (modeled, stable) primitive collections by mapping them to (non-JSON) arrays; then, when arbitrary LINQ operators are composed to top of such collections, the PG unnest function can be used to convert such arrays to regular relational rowsets, and at that point the regular SQL querying operators can be used. There still could be a need to support (stable) collections as JSON arrays - particularly collections within larger JSON documents (so that the nested list is necessarily a JSON list); and at that point I'd expect us to use json_array_elements similarly to how we current use unnest, i.e. to unpack the JSON collection to a rowset.

Then there's indeed still the case of unstable, unmodeled JSON data; that's not currently a big focus (as stable JSON data is generally more requested and support is far from done there). But yeah, it should be possible to allow composing arbitrary LINQ operators over unstable JSON collections, and translate that to json_array_elements (just like we'd do for stable JSON collections).

adamal commented 5 months ago

it isn't clear to me that what people are asking for in this issue is support for unstable JSON

I can see how that isn't clear at all from the first post alone. However the linked SO-question, looks very much like an unstable schema. I think that question was what brought me here. I believe we have a quite similar situation, but rather than speculate on the similarity, I'll expand on what our use case is. I'll try to err on the side of too much rather than too little detail to better convey our perspective. Just take it as a single data point.


We build and maintain a "work management system" that is fairly dynamic. E.g. we have a Task entity. It has a json Properties property and a TaskTypeId that references a TaskType which has a json Schema property, that constrains the Task.Properties.

We have been using ElasticSearch to query and maintain indexes for these dynamic types. That came with a lot of complexity and we have discovered that Postgres, which is used as the main persistence layer, can serve us adequately. So we are in the process of removing Elastic and with that a big chunk of complexity and mental overhead.

We have not used Entity Framework in this system at all previously. But now that we are rebuilding the search functionality, it looked like we would either write and maintain a lot of manual sql , or – if refactoring to bite-sized chunks – would end up reimplementing Linq to SQL, only poorly. In stead we are opting for powering the search from EFCore. Configuring the DBContext to link the existing (non EF) model with the existing (also non EF) postgres schema proved surprisingly easy, with much less manual configuration than Dapper which our existing Repositories use.

Like I indicated previously, so far, our requirements can be met using EF Core and a few Expression-extension methods (Chain, Or and And pieced together from Stack Overflow answers). So far the performance implications of this approach (i.e. producing varying SQL-queries) has been acceptable, and they would not have been any better handrolling the sql.

roji commented 5 months ago

@adamal thanks for the details. Yes, the PG capabilities around JSON have indeed made it quite an attractive alternative to some other dedicated solutions etc.

At the end of the day, EF doesn't currently support unstable ("weakly-typed") JSON mapping, and focus is currently on getting the stable JSON support to be better (there's still quite a bit missing there). The general issue tracking that on the EF side is https://github.com/dotnet/efcore/issues/28871. Just one note - I'd encourage carefully thinking whether unstable JSON really is what you want/need; in many cases, people think they want arbitrary JSON, but the reality is that the data does actually have a schema, which is implicit and expressed in the way that the application interacts with the documents. Actual, fully-unstructured JSON documents are relatively rare in the real world. If that's the case, you may be able to get by with using EF's modeling.

However, if that doesn't fit your needs, keep in mind that you can always drop down the raw SQL queries for the places where you need to interact with unstable JSON, until proper support comes along for EF querying.