jOOQ / jOOQ

jOOQ is the best way to write SQL in Java
https://www.jooq.org
Other
6.17k stars 1.21k forks source link

Consider supporting converters within JSON_OBJECT, JSON_ARRAY, and other JSON projecting functions #17597

Open alf opened 1 week ago

alf commented 1 week ago

Expected behavior

While trying to compose a query that searches multiple different tables for data we discovered that using .convertFrom inside a jsonObject does not work. While this would have been nice to have, it's ok that it doesn't work.

The main issue here is that jOOQ doesn't tell us that it's not supported, it just silently ignores our .convertFrom.

Actual behavior

No response

Steps to reproduce the problem

Here's a silly test that demonstrates the issue:

var x = ctx.select(
                jsonObject(
                        key(inline("key")).value(
                        jsonArray(inline(1)).convertFrom(String.class, (it) -> "one"))
                )
        ).fetchOne();

        assertEquals("{\"key\":[\"one\"]}", x.getValue(0).toString());

jOOQ Version

3.19.14

Database product and version

Oracle

Java Version

No response

JDBC / R2DBC driver name and version (include name if unofficial driver)

No response

lukaseder commented 1 week ago

A Converter is a feature of the data type binding org.jooq.Binding, which is applied to result values after reading them from JDBC ResultSet (or prior to writing them to PreparedStatement, of course). A Converter will not apply to your query if it isn't used in such a context. E.g. if you use a Converter on your ID fields (PK / FK), you could write a JOIN .. ON clause like this:

.on(TABLE1.CONVERTED_PK.eq(TABLE2.CONVERTED_FK))

And you wouldn't expect this to have any implication on your query other than the added type safety (e.g. you couldn't compare CONVERTED_PK with UNCONVERTED_FK)

The same is true if you embed any converted field in any expression that doesn't really need the conversion, such as a JSON document. You maybe wouldn't have reported this if you had written:

jsonObject(key("key").value(TABLE1.CONVERTED_PK)).isJson()

Now, it does appear to be a special case to project such a JSON_OBJECT, so maybe, you'd have thought jOOQ would at least have to warn you about things in this case:

select(jsonObject(key("key").value(TABLE1.CONVERTED_PK)))

But it still won't and I don't think it should. Your query isn't really different from this one:

select(TABLE1.OPAQUE_JSON_OBJECT)

I.e. the structure of the JSON document isn't really known to jOOQ.

I'm aware that you're coming from a MULTISET mindset, where jOOQ went further as any mapping / conversion emulation is recursive throughout the MULTISET expression (without which the feature would only be 20% as useful!). But the presence of such a feature set for MULTISET doesn't imply that it must work everywhere for any type of expression. In particular, jOOQ can't possibly know the conversion semantics of your user type U when placed back into the database data structure T (from Converter<T, U>). What if your converter did Converter<JSON, MyType>? How to serialise MyType back into the JSON document?

Likewise, imagine someone thinking that they can do something along the lines of:

listAgg(TABLE1.CONVERTED_PK, ",")

Would that imply that the converter is applied to every element of the aggregation? Should we parse the aggregation result (but what if "," is a SQL expression, what if the data contains ,), or should we emulate the aggregation in Java?

What about:

sum(TABLE1.CONVERTED_PK)

Should we continue to delegate calculating the sum() to SQL, or should we perhaps fetch an array of values and calculate the sum of converted values ourselves? But then, how to implement e.g.

sum(TABLE1.CONVERTED_PK).eq(sum(TABLE2.CONVERTED_FK))

Because if the converters alter the value of the summands, then how can we ensure the behaviour will be consistent between projections / non-projections?

Perhaps, this shows that record mapping and conversion within MULTISET should have used a different set of methods that work only there in order to remove any potential confusion, although it really is the same feature there as it is for top-level projections. It actually is only the top-level projection (which is recursive and aware of its complete structure!). An alternative API design that I considered at the time was to put all the recursive mappings to the top level. I rejected this because it seemed so overwhelming for a complex query to separate the nested MULTISET tree from the nested mapping tree (try debugging a Result<Record17<Result<Record13<Result<Record7<...>>>>>>)

Here's a fun case that shows how your assumptions break in a different context:

The user there is using MULTISET, but it doesn't matter, really. It would be the same without MULTISET. When using different converters in different UNION subqueries, all that jOOQ / javac requires is that the row types be the same. The Converter implementations may be different. But jOOQ cannot possibly know which row belongs to which UNION subquery, so it cannot make the decision which one to apply (other than just always applying the one from the first subquery). Now, you might be tempted to claim that jOOQ should emit a warning, but how to even warn about this? The lambda instances may be different (equals() may return false), the only way to know that they're incompatible is to convert each value with both converters and find disagreeing results, and then throw an exception. But that appears to be prohibitive in terms of performance.

In your particular case, it seems much simpler to just use SQL for these kinds of expressions, instead:

jsonObject(
    key(inline("key")).value(
        jsonReplace(jsonArray(inline(1)), "$[*]", "one"))
    )
)

Obviously, I assume that your real-world converter does more sophisticated logic that is harder to express in SQL. So, the solution then would be to apply a converter to the top level that transforms the entire JSON document, not just the fragment you're interested in.

I'm keeping this issue open as a feature request for the long-term roadmap. JSON documents (as well as XML documents) are sufficiently similar to what we're doing with MULTISET, and we could possibly even re-use some logic, so there's a small potential for finding something viable here, at least if only the "projection-ish" parts of JSON are used (JSON_ARRAY and JSON_OBJECT, maybe the aggregate functions). But again, I don't want to raise expectations too high. There will always be expressions that are inaccessible to jOOQ.

Even with your example, this wouldn't work as you seem to expect:

select(
    jsonGetElementAsText(
        jsonArray(inline(1)).convertFrom(String.class, (it) -> "one"),
        "$[0]"
    ).eq("one")
)

It would return false, not true, because it simply doesn't make sense to expect this converter to be pushed into SQL before the evaluation of the jsonGetElementAsText() function.

alf commented 1 week ago

Thank you for the great explanations. My mental model keeps improving.

My silly test wasn’t very well thought through. The point was to show how easy it is to write something that slips between the cracks of a fractured understanding of how converters work.

What I’m really asking for in this issue is some way to help newer developers to understand why when something doesn’t work as expected. I’m lucky to have you to explain things to me since I’m not afraid to ask questions.

Perhaps it would be possible to create som static analysis/linting tool with rules picking up common gotchas such as these?

Maybe something like this already exists?

lukaseder commented 1 week ago

Perhaps it would be possible to create som static analysis/linting tool with rules picking up

It might be possible, but I would imagine it would be really hard to get right, and the false positives would be overwhelming. I listed quite a few potential false positives. You can think this through and you'll see how many "acceptable" cases of converters being "ignored" (or rather, inconsequential for the query) there are.

Personally, I'm almost never in favour of static code analysis for the above reasons.

common gotchas such as these?

I don't think this is a common gotcha. With the UNION case, it's the second time this was brought up on the issue tracker over all these years (at least as far as I can tell right now). In both cases, the user was a power user. I understand that you both reported on this, and that there's a silent majority not reporting things, but my gut feeling is that there are far bigger difficulties for beginners who won't immediately jump into this topic.

Rather than overengineering the response to this kind of problem, perhaps you could ask the same question again on Stack Overflow for improved visibility ("why doesn't this jOOQ converter apply to this nested expression"). I'll be happy to answer there as well. Or perhaps, you answer in your own words, explaining your now gained understanding, which might better help a new user who follows a similar thought process as you did.

I can definitely add some Javadoc on Binding and Converter to hint at these things only being applicable at the top level of a query, and when interacting with JDBC (it's already there on Binding and implied on Converter, but probably not clear enough).

But really, I don't want to spend too much documentation text on this topic because it would be very confusing for most beginners. With jOOQ, there's always this duality of what's being done in SQL and what's being done in Java. Take this, for example:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.A.gt(10))             // SQL predicate
   .fetch()
   .stream()
   .filter(r -> r.get(T.A) < 100) // Java predicate
   .toList();

Now, I'm aware of some folks being confused by the fact that the same "stream-looking" thing can be filtered twice and the syntactic difference doesn't seem too obvious to the untrained eye. But thinking about this, it's obvious that the second predicate can't be pushed into the SQL query, or the first can't be pulled out of it. Does this require documentation? Or a linting tool? I don't know. I can think of many more viable cases of doing such a thing than the 1-2 cases where this is just an accident.

Granted, with ad-hoc converters, things look a bit different because they're syntactically embedded in your query, which leads to nice types but increases complexity of possibly already complex queries. But ad-hoc converters are still converters, just like converters attached to your fields by the code generator. Again, this question hardly ever appeared on the issue tracker, because converters used by the code generator are far more popular than ad-hoc converters (except for MULTISET usage), and somehow there, the intuition was never for a converter to be somehow interacting with the SQL query itself, or nested parts of the result.

In a way, this "syntactically embedded" thing is at the core of understanding what jOOQ really is. For example, there are tons and tons of users for whom this simply doesn't click:

// This
ctx.select(T.A, T.B)
   .from(T)
   .where(T.A.gt(10));

// ... is more or less the same as this to the Java compiler
var a = T.A;
var b = T.B;
var t = T;
var i = 10;
var c = T.A.gt(i);

ctx.select(a, b)
   .from(t)
   .where(c);

Many users struggle very hard to see how jOOQ is for dynamic SQL, because it just looks like static SQL with added parentheses, and thus they write the most complicated "dynamic SQL" using jOOQ. It never occurs to them that for a dynamic predicate, they could just write this:

ctx.select(T.A, T.B)
   .from(T)
   .where(something ? T.A.gt(10) : T.A.lt(100));

Now, we're entering the same realm of mixing SQL expressions with Java expressions, just as with converters. Here, the something is evaluated first, in Java, before the SQL expression is embedded in the query. With converters, the conversion is evaluated last after the SQL query is rendered (and executed!). The confusion arises solely out of syntactic context, when using ad-hoc converters. If the jOOQ API completely separated converters and query building (e.g. as with the RecordMapper), then this wouldn't be an issue. But again, that would have made MULTISET only 10% as fun to use, so ad-hoc converters were introduced.

The same thing appears with any library that operates on an expression tree, btw. E.g. try implementing an annotation processor. Metaprogramming is a blessing and a curse :)

Anyway, I will take a step back and look again into how MULTISET (and ROW) are special cases that have lead to this confusion twice. Some additional Javadoc on Converter (and Binding) are definitely in order, and possibly some additional Javadoc on the ad-hoc converter API (Field.convertFrom(), etc.).

lukaseder commented 1 week ago

Also, here would be a good place:

lukaseder commented 1 week ago

Some additional Javadoc on Converter (and Binding) are definitely in order

This is the relevant task:

I'm still thinking about how to make it clear that MULTISET and ROW have a special case of allowing this nesting of Converter, and why it can't be reasonably expected to work this way anywhere else.