jOOQ / jOOQ

jOOQ is the best way to write SQL in Java
https://www.jooq.org
Other
6.15k stars 1.21k forks source link

Add DSL.noTable() for optional, dynamic joins #9635

Closed ayush-finix closed 1 year ago

ayush-finix commented 4 years ago

Use case:

We have a query with three variants. You can imagine the pseudocode/query to be like

select * from table a
if (b) join table b on ...
if (c) join table c on ...
where
if (b) b.<fieldb> = 'val_b'
AND if(c) c.<fieldc> = 'val_c'

where b and c are some parameters passed into the method.

Currently, we have to split the query building into multiple lines of code because there's no way to bind the statement join(TABLE).on(...) to something and there's no equivalent of DSL.noCondition for the join part.

Possible solution you'd like to see:

I'd like to be able to write something like

final Join bJoin;
final Condition bCondition;
if (b) {
 bJoin = DSL.join(TABLE_B).on(...)
 bCondition = TABLE_B.field.eq(val_b)
}
else {
 bJoin = DSL.noJoin()
 bCond = DSL.noCondition()
}
... similar for c ...

DSL.select(TABLE_A)
.join(bJoin)
.join(cJoin)
.where(bCond)
.and(cCond)

Possible workarounds:

Right now we have something like

final Function<SelectOnConditionStep<Record1<>>, SelectOnConditionStep<Record1<>>> bJoin
final Condition bCond;
if (b) {
  bJoin = q -> q.join(TABLE_B).on(...)
  bCond = TABLE_B.field.eq('val_b')
}
else {
 bJoin = Function.identity()
 bCond = DSL.noCondition()
}

q1 = DSL.select(TABLE_A);
qWithJoins = bJoin.andThen(cJoin).apply(q1)
finishedQuery = qWithJoins.where(bCond).and(cCond)
finishedQuery.fetch()

Please describe the possible workarounds you've implemented to work around the lacking functionality.

Versions:

None of this should really matter for this feature I think.

lukaseder commented 4 years ago

Thank you very much for your feature request. How about this pattern?

Optional.of(TABLE_A)
    .map(t -> b ? t.join(TABLE_B).on("b condition") : t)
    .map(t -> c ? t.join(TABLE_C).on("c condition") : t)
    .get()

Notice that when you're using inner join, there's no difference between putting a predicate in ON or in WHERE.

Notice, one important step here is to realise that SelectJoinStep.join() is just syntax sugar / convenience for using Table.join() methods (producing fewer parentheses), which are probably preferrable if you're doing dynamic SQL like this. I don't think we really need new API for your particular use case.

I'll leave this issue open as we might be able to collect more such use-cases. Perhaps there is a low hanging fruit for improving this experience.

ayush-finix commented 4 years ago

Wrapping a fixed value just to call map on it seems a bit of an abuse =P.

Just for a little more context, the reason I brought this up was to essentially be able to see/build the entire query in one go and separate out the dynamic choice steps elsewhere, like how using noCondition allows you to do something like .where(bCond) regardless of code path, so you don't end up reading DSL + java code mixed together [if that makes sense, sorry if I'm not explaining it well].

I'll definitely try using the table.join [didn't know this existed, thanks] directly and see if I can come up with something nice.

lukaseder commented 4 years ago

Wrapping a fixed value just to call map on it seems a bit of an abuse =P.

I challenge you to show why that's the case. I find that quite elegant.

Just for a little more context, the reason I brought this up was to essentially be able to see/build the entire query in one go and separate out the dynamic choice steps elsewhere

Sure, you can do this with a method that returns a table conditionally.

The fact that some logic might need to be distributed between join() clauses and the where() clause will make a truly dynamic approach a bit more tricky anyway. But if you can get away with putting your predicates all in the on() clause, then it's really very simple using Table.join() only.

ayush-finix commented 4 years ago

Wrapping a fixed value just to call map on it seems a bit of an abuse =P.

I challenge you to show why that's the case. I find that quite elegant.

It's equivalent to just binding the anonymous functions you want to a variable and chaining them since you'll never have an empty optional.

final Function b = t -> b ? t.join(TABLE_B).on("b condition") : t;
final Function c = t -> c ? t.join(TABLE_C).on("c condition") : t;
final result = b.andThen(c).apply(TABLE_A)

Side note: we're using the awesome cyclops-react functional library, and their Option class (in addition to having way more functionality than the java 8 optional), doesn't actually have a get method to unpack the value (it's somewhat opinionated on not throwing exceptions), so you would have to handle the impossible case in a really awkward way.

and if you really, really want to not leak any variables to a different scope, then

joined = (t - > {
res_b = b ? t.join(TABLE_B).on("b condition") : t);
res_c = c ? res_b.join(TABLE_C).on("c condition") : res_b;
}).apply(TABLE_A);

Just for a little more context, the reason I brought this up was to essentially be able to see/build the entire query in one go and separate out the dynamic choice steps elsewhere

Sure, you can do this with a method that returns a table conditionally.

The fact that some logic might need to be distributed between join() clauses and the where() clause will make a truly dynamic approach a bit more tricky anyway. But if you can get away with putting your predicates all in the on() clause, then it's really very simple using Table.join() only.

This is the approach I'm looking into.

As an aside, I do want to thank you for

  1. building an amazing library (we're removing hibernate and going pure jooq for our main code base) and 2. Being on top of everything for support. There's already a guide or a stack overflow answer for almost every question our team has had.
lukaseder commented 4 years ago

It's equivalent to just binding the anonymous functions you want to a variable and chaining them since you'll never have an empty optional.

You omitted the generics, which would make this approach much more verbose (in Java). Also, the order of tables is confusing now, IMO. But indeed, technically, they're equivalent, I'd say.

so you would have to handle the impossible case in a really awkward way.

Well, you could make an exception and use Optional here (or is that verboten?). Or you use getOrElse(DSL.dual()). Or you use a Stream and reduce the joins.

Anyway, it's good feedback. I do wonder if there's something we could improve in the jOOQ API to make dynamic joins more "FP idiomatic". Especially when left joins are used as anti joins (A LEFT JOIN B ON .. WHERE B.ID IS NULL), my suggestions won't work, as two conditional things need to be done separately: Adding the join and adding the WHERE clause. A workaround would be to use the synthetic leftAntiJoin() operation, but that produces NOT EXISTS, when perhaps an actual LEFT JOIN was desireable for RDBMS specific performance reasons.

We'll think about this.

As an aside, I do want to thank you for

Thanks for your nice words! Greatly appreciated :)

lukaseder commented 4 years ago

Related blog post: https://blog.jooq.org/2020/03/06/create-empty-optional-sql-clauses-with-jooq/

lukaseder commented 4 years ago

I still don't see any compelling API in this area that is better than the status quo, perhaps with the help of Optional or Stream... Keeping the issue open though, as a solution is definitely desireable.

liias commented 4 years ago

Hey there, Lukas.

As a user of the jooq api and not looking into any of the implementation, I would like to consume it like this:

selectFrom(WALLPAPER)
.join(colorName == null? noJoin() : COLOR).on(WALLPAPER.COLOR_ID.eq(COLOR.ID))
.where(colorName == null? noCondition() : COLOR.NAME.eq(colorName))

This means noJoin() effectively also ignores the .on() filter. I guess this doesn't follow existing implementation, in which case dump this idea :)

Also, thanks for the library! Managed to start using it finally after creating a maven plugin for postgres support (jooq-codegen-postgres-maven-plugin, but that's offtopic).

lukaseder commented 4 years ago

Thanks for the suggestion, @liias. Interesting idea, so in this case, we'd maybe call it noTable(), and indeed, the ensuing ON or USING clauses would then just be ignored (or users could pass an additional noCondition() in a conditional expression).

It's ugly, but it could work. It would also work in a table list:

select(...)
.from(A, noTable(), B, noTable(), C);

The semantics of a noTable() needs to adapt to an identity value of each respective join operation. The dee table (from the dum dee table example) in addition to ignoring user join predicates seems to fulfil this requirement at first sight. It is a table with no columns and one row. If we join that table ON TRUE, then the join operation has no effect, (which is the purpose of the dum/dee thought experiment) e.g.

t CROSS JOIN dee -- t
t INNER JOIN dee ON TRUE -- t
t LEFT JOIN dee ON TRUE -- t
t RIGHT JOIN dee ON TRUE -- t
t FULL JOIN dee ON TRUE -- t

Curiously, PostgreSQL also makes NATURAL JOIN work, since the two tables have no columns in common, the implied ON predicate is TRUE as well, effectively turning this into a CROSS JOIN.

t NATURAL INNER JOIN dee ON TRUE -- t
t NATURAL LEFT JOIN dee ON TRUE -- t
t NATURAL RIGHT JOIN dee ON TRUE -- t
t NATURAL FULL JOIN dee ON TRUE -- t

An empty USING () clause is not allowed in SQL, not in PostgreSQL either, although we should probably allow it: #10070. However, in the presence of a noTable(), we'd simply ignore the columns passed to USING (), so we can make this work as well.

What about the other JOIN types, we also have:

t CROSS APPLY dee -- t
t OUTER APPLY dee -- t
t CROSS JOIN LATERAL (SELECT * FROM dee) u -- t
t LEFT JOIN LATERAL (SELECT * FROM dee) u -- t
t LEFT SEMI JOIN dee -- t
t LEFT ANTI JOIN dum -- t (notice, here we need to implement dum semantics, not dee semantics)

When users do manual semi or anti joins using [ NOT ] IN or [ NOT ] EXISTS, they would probably already use noCondition() in a conditional expression, e.g.

select(...)
.from(...)
.where(
    something
  ? exists(...)
  : noCondition()
);

Conclusion:

Yep. We can make a noTable() happen. I'd also like to add some sort of explicit trueTable() (dee, corresponding to trueCondition()) and falseTable() (dum, corresponding to falseCondition()), but I don't think many database dialects support column-less tables. ~PostgreSQL supports them in CREATE TABLE but almost nowhere else.~ (that's wrong. SELECT is a valid statement in PostgreSQL)

There will be edge cases, but it can work.

liias commented 4 years ago

Thorough response, some new terminology and join types to learn for me. I do think, however, it probably makes sense to wait for more feedback, as currently it's just us 2 users in this ticket.

Tell me if I got you correctly. Using join(noTable()) would still generate SQL then? Will they be real tables (i.e the dee/dum you mention) or something like CTE/temp tables?

lukaseder commented 4 years ago

I do think, however, it probably makes sense to wait for more feedback, as currently it's just us 2 users in this ticket.

Not sure, there's a pretty clear vision now, so I think this might pass if it's doable after looking into an implementation draft. This is not the only issue on this topic, there have been others, and also user group and stack overflow questions, as well as private emails. This is a recurring topic.

Tell me if I got you correctly. Using join(noTable()) would still generate SQL then? Will they be real tables (i.e the dee/dum you mention) or something like CTE/temp tables?

No, if there will be trueTable() and falseTable(), those would generate SQL. A noTable() (just like the noCondition()) will try to avoid generating any SQL, if it is syntactically valid to omit it.

liias commented 4 years ago

Sounds good!

lukaseder commented 4 years ago

I just needed something like this myself, to add a "conditional union", i.e. union(condition ? select(A, B).from(T) : noSelect())...

A noSelect() method would be quite interesting, as it would need to be able to infer the <R extends Record> type from where it is used (and Java often isn't too good with this). But I just tried this, and it worked as well!

System.out.println(
    ctx.selectOne().union(null).union(null).union(selectOne())
);

Although, it doesn't work very well, e.g.

System.out.println(
    ctx.selectOne().unionAll(null).union(null).union(selectOne())
);

...

Produces:

(
  (
    select 1 one
  )
)
union (
  select 1 one
)

An undocumented feature! Which means some people are probably already relying on it. It's not unreasonable to expect the jOOQ API to accept null values for QueryPart instances, and have this noTable(), noCondition(), etc. semantics in cases where null cannot mean anything else. In case of Condition or Field, the meaning is a bit ambiguous, because a user could really mean to pass a SQL NULL bind value, especially for Field, a bit less obviously for Condition.

This doesn't work yet for tables:

System.out.println(
    ctx.select().from(table("t").crossJoin((Table<?>) null))
);

Yields:

Exception in thread "main" java.lang.NullPointerException
    at org.jooq.impl.JoinTable.<init>(JoinTable.java:188)
    at org.jooq.impl.JoinTable.<init>(JoinTable.java:180)
    at org.jooq.impl.AbstractTable.join(AbstractTable.java:1183)
    at org.jooq.impl.AbstractTable.crossJoin(AbstractTable.java:1444)
lukaseder commented 4 years ago

Of course, the implicit join feature could be a viable solution in some cases, when the join expression follows a to-one relationship:

ctx.select(TABLE_A.asterisk(), b ? TABLE_A.tableB().asterisk() : inline(""))
   .from(TABLE_A)
   .where(b ? bCondition : noCondition())
   .fetch();

The mere presence of TABLE_A.tableB() in the query will produce the optional join, so it can be safely ignored. See: https://www.jooq.org/doc/latest/manual/sql-building/sql-statements/select-statement/implicit-join/

sebastianhaberey commented 4 years ago

👍 for noXxx() - Functionality. I've been breaking my head over an elegant solution to have an optional dynamic join. Good to see I wasn't the only one.

lukaseder commented 4 years ago

Just in case: if this is about to-one joins, optional implicit joins can be an elegant solution https://www.jooq.org/doc/latest/manual/sql-building/sql-statements/select-statement/implicit-join/

sebastianhaberey commented 4 years ago

I believe the implicit join isn't applicable to our case (we are joining four tables but selecting from only one of them), but that is actually an awesome feature! TIL

lukaseder commented 4 years ago

I believe the implicit join isn't applicable to our case (we are joining four tables but selecting from only one of them)

Can you show an example? In many cases, it's not that obvious to know whether it's applicable. Some queries can be rewritten for implicit joins to work

sebastianhaberey commented 4 years ago

Sure! I had to modify the original code a bit (not sure I'm allowed to post it) but this is basically it. We have this method that finds all USERs who are COURSEPROVIDERs and match some additional condition that can be passed into the method, e.g. PROVIDER.PREMIUM.isTrue():

return dsl.select(USER.asterisk())
        .from(PROVIDER)
        .join(COURSEPROVIDER)
        .on(COURSEPROVIDER.PROVIDER_ID.eq(PROVIDER.ID))
        .join(USER)
        .on(USER.ID.eq(COURSEPROVIDER.USER_ID))
        .where(PROVIDER.DEACTIVATED.isFalse().and(condition))
        .fetchInto(User.class);

And then there is another method with almost the same query, extended by a join on and a group by clause:

return dsl.select(USER.asterisk())
        .from(PROVIDER)
        .join(COURSEPROVIDER)
        .on(COURSEPROVIDER.PROVIDER_ID.eq(PROVIDER.ID))
        .join(USER)
        .on(USER.ID.eq(COURSEPROVIDER.USER_ID))
        .join(CONCEPT)
        .on(CONCEPT.PROVIDER_ID.eq(PROVIDER.ID))
        .where(PROVIDER.DEACTIVATED.isFalse().and(condition))
        .groupBy(USER.ID)
        .fetchInto(User.class);

The intention behind the second statement is "find all USERs who are COURSEPROVIDERs and have at least one CONCEPT".

I wanted to put this into one single method that's clear for my fellow colleagues to read and reusable. I was trying to give the client a way to pass the last, optional join, and the group by into the method - or another one, if they wish to do so, or none at all.

lukaseder commented 4 years ago

Hmm, I don't think these queries are correct / optimal. Both seem to produce cartesian products. The second one removes them again with GROUP BY, but that might be far from optimal.

How about this instead?

return dsl.select(USER.asterisk()
          .from(USER)
          .where(USER.ID.in( // Use a semi-join to fix the cartesian products
            select(COURSEPROVIDER.USER_ID)
            .from(COURSEPROVIDER)
            .join(PROVIDER).on(COURSEPROVIDER.PROVIDER_ID.eq(PROVIDER.ID))
            .where(PROVIDER.DEACTIVATED.isFalse())
            .and(condition) // Put the USER part in this condition in the outer query
            .and(dynamicConceptCheck // Dynamic semi-join here
              ? PROVIDER.ID.in(select(CONCEPT.PROVIDER_ID).from(CONCEPT))
              : noCondition())
          ))
          .fetchInto(User.class);

Just in case, implicit joins can help, but are optional (as always):

return dsl.select(USER.asterisk()
          .from(USER)
          .where(USER.ID.in(
            select(COURSEPROVIDER.USER_ID)
            .from(COURSEPROVIDER)
            .where(COURSEPROVIDER.provider().DEACTIVATED.isFalse()) // Implicit join here
            .and(condition)
            .and(dynamicConceptCheck
              ? COURSEPROVIDER.PROVIDER_ID.in(select(CONCEPT.PROVIDER_ID).from(CONCEPT))
              : noCondition())
          ))
          .fetchInto(User.class);
sebastianhaberey commented 4 years ago

Awesome, thank you for having a look at this! So I take it I could also use three explicit semi-joins like so:

return dsl.select(USER.asterisk())
    .from(USER)
    .where(USER.ID.in(
        select(COURSEPROVIDER.USER_ID)
        .from(COURSEPROVIDER)
        .where(COURSEPROVIDER.PROVIDER_ID.in(
            select(PROVIDER.ID)
            .from(PROVIDER)
            .where(providerCondition)
            .and(dynamicConceptCheck
                    ? PROVIDER.ID.in(select(KONZEPT.PROVIDER_ID).from(KONZEPT))
                    : DSL.noCondition()))
        )))
    .and(userCondition)
    .fetchInto(Benutzerprofil.class);

?

lukaseder commented 4 years ago

You could. Keep an eye on execution plans, just in case your RDBMS doesn't think the queries are equivalent (they should be, but not all optimisers are equal)

lukaseder commented 2 years ago

Looking into this now, in the context of a variety of such constructors that may be desirable for better dynamic SQL (#12969).

This one here is tricky. The NoTable type will have to be able to join itself or be joined, and the resulting expression should no-op. E.g.:

assertEquals(T, noTable().join(T).on(condition));
assertEquals(T, T.join(noTable()).on(condition));

The tricky part is that the join expression may have optional on() (or using(), etc.) methods, so this must be possible, too:

assertEquals(T, noTable().crossJoin(T);
assertEquals(T, T.crossJoin(noTable());

And there are convenience and() and or()methods after on(), such as:

assertEquals(T, noTable().join(T).on(condition)).and(otherCondition);
assertEquals(T, T.join(noTable()).on(condition)).and(otherCondition);

Hence, we can't get rid of the JoinTable implementation in the expression tree, as users may append more clauses to it. This is different for NoCondition, which we never keep around in the expression tree, so we don't have to constantly check if a condition is instanceof NoCondition.

It's not impossible to implement this correctly, but again, it's much more tricky than NoCondition

lukaseder commented 1 year ago

I've had a look at this now for 3.19. It is indeed much more complicated than other noXYZ() API elements, but not impossible.

A refactor of the JoinTable hierarchy was necessary, and an auxiliary NoTableJoin type was needed: i.e. when joining noTable().join(T), we can't just return T. The resulting type must implement the JOIN DSL so users can call on(), using(), partitionBy(), etc. on the type, but once rendered, it must behave like the table itself. Also, if more joins are added to the join tree, NoTableJoin must remove itself again from the query.