Support `join` parenthesis.

igalklebanov commented 3 weeks ago

Hey 👋

Kysely currently produces joins without parenthesis, and engines process this from left to right. This can produce unwanted results in some cases. e.g. left join followed by an inner join, in which you want to keep t1's rows regardless of inner join between t2 and t3.

You can wrap joins in parenthesis to group joins and break out of the regular processing order:

select *
from "t1"
left join (
  "t2"
  inner join "t3" on "t2"."c" = "t3"."c"
) on "t1"."c" = "t2"."c"

SQLite might require aliasing the parenthesis before the on clause.

spearmootz commented 3 weeks ago

for the record if anyone has the issue. here is how i got around this


query
  .leftJoin(
    // @ts-expect-error this is needed because typescript complains but it does work
    ({ parens }) => parens(sql`("table1"
                  INNER JOIN "table2" ON "table2"."id" = "table1"."table2_id")`),
    "table1.id",
    "previousTableFoundInQuery.table1_id"
  )
  // @ts-expect-error this is required because the raw causes typescript not to infer the type
  .orderBy("table.id", `asc NULLS LAST`);

koskimas commented 3 weeks ago

Wouldn't this be equivalent to

select *
from "t1"
left join (
  select "t2".*
  from "t2"
  inner join "t3" on "t2"."c" = "t3"."c"
  as "t2"
) on "t1"."c" = "t2"."c"

?

The query optimizer will most likely turn that into exactly the same query plan as the original example. Only a really dumb optimizer would materialize the inner query in this case.

This query you can already build using kysely:

db.selectFrom("t1")
  .leftJoin(
    db.selectFrom("t2").innerJoin("t2", "t2.c", "t3.c").selectAll("t2").as("t2"),
    "t1.c",
    "t2.c",
  )
  .selectAll()

This is the very first time during my 18 year career that I see the syntax in the original message. Do we really want to rewrite our join builder and logic to support this case?

koskimas commented 3 weeks ago

I wrote this little test. The only difference in the syntaxes is that in the subquery case, t3 columns are not selected in the outer query. You need to explicitly select them from the inner one.

DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
DROP TABLE IF EXISTS t3;

CREATE TABLE t1 (id SERIAL PRIMARY KEY, c int);
CREATE TABLE t2 (id SERIAL PRIMARY KEY, c int);
CREATE TABLE t3 (id SERIAL PRIMARY KEY, c int);

CREATE INDEX t1_c_idx ON t1(c);
CREATE INDEX t2_c_idx ON t2(c);
CREATE INDEX t3_c_idx ON t3(c);

INSERT INTO t1 (c)
SELECT floor(random() * 100000000) 
FROM generate_series(0, 1000000);

INSERT INTO t2 (c)
SELECT floor(random() * 100000000) 
FROM generate_series(0, 1000000);

INSERT INTO t3 (c)
SELECT floor(random() * 100000000) 
FROM generate_series(0, 1000000);

VACUUM ANALYZE;

EXPLAIN ANALYZE SELECT *
FROM t1
LEFT JOIN (t2 INNER JOIN t3 ON t2.c = t3.c) ON t1.c = t2.c
;

EXPLAIN ANALYZE SELECT *
FROM t1
LEFT JOIN (
  SELECT t2.* FROM t2 INNER JOIN t3 ON t2.c = t3.c
) AS t2 ON t1.c = t2.c
;

At least with this dummy dataset, the query plans are 100% equivalent.

igalklebanov commented 3 weeks ago

Agree with @koskimas, not much added value in adding this - it'll be complex and the DX is not guaranteed to be great.

Closing this for now.

kysely-org / kysely

Support `join` parenthesis. #1198