Support `JOIN ... USING ( col1, col2, ... )`

dwickern commented 1 year ago

Many databases support this shorthand for joins when the column names are the same in both tables.

From Postgres docs:

The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.
=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;
 num | name | num | value
-----+------+-----+-------
   1 | a    |   1 | xxx
   3 | c    |   3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);
 num | name | value
-----+------+-------
   1 | a    | xxx
   3 | c    | yyy
(2 rows)

PostgreSQL: https://www.postgresql.org/docs/current/queries-table-expressions.html MySQL https://dev.mysql.com/doc/refman/8.0/en/join.html Sqlite: https://www.sqlite.org/syntax/join-constraint.html Oracle: https://docs.oracle.com/javadb/10.6.2.1/ref/rrefsqljusing.html SQL Server: not supported

Usage could be something like

db.selectFrom('person as a')
  .innerJoinUsing('person as b', ['first_name', 'last_name'])
  .select(['a.id', 'b.id'])

koskimas commented 1 year ago

I don't know if this is worth the complication in the code. Especially if we did this in the most type-safe way possible, we should only accept columns both tables have. That'd be tricky.

But if we do add this, we could add it to the join builder

db.selectFrom('person as a')
  .innerJoin('person as b', (join) => join.using(['first_name', 'last_name']))
  .select(['a.id', 'b.id'])

igalklebanov commented 1 year ago

@koskimas Apparently, they behave differently:

A USING clause can be rewritten as an ON clause that compares corresponding columns. However, although USING and ON are similar, they are not quite the same. Consider the following two queries:
a LEFT JOIN b USING (c1, c2, c3)
a LEFT JOIN b ON a.c1 = b.c1 AND a.c2 = b.c2 AND a.c3 = b.c3
With respect to determining which rows satisfy the join condition, both joins are semantically identical.

With respect to determining which columns to display for SELECT * expansion, the two joins are not semantically identical. The USING join selects the coalesced value of corresponding columns, whereas the ON join selects all columns from all tables. For the USING join, SELECT * selects these values:
COALESCE(a.c1, b.c1), COALESCE(a.c2, b.c2), COALESCE(a.c3, b.c3)
For the ON join, SELECT * selects these values:
a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
With an inner join, COALESCE(a.c1, b.c1) is the same as either a.c1 or b.c1 because both columns have the same value. With an outer join (such as LEFT JOIN), one of the two columns can be NULL. That column is omitted from the result.

https://dev.mysql.com/doc/refman/8.0/en/join.html

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN

So it's not just syntactic sugar. But is it an important enough difference worth adding 3-4 methods for everywhere and the type complexity of query context (e.g. columns a.c1 & b.c1 no longer exist, but c1 does) and result type? 🤷

dwickern commented 1 year ago

Interesting. There is a difference when using an unqualified select * from joined tables with duplicate column names.

select * from a join b on a.c1 = b.c1 is equivalent to select a.*, b.*. Column c1 appears twice in the result set. In my tool it's prefixed like a.c1 and b.c1.

select * from a join b using (c1) only has a single copy of c1.

Kysely seems to overwrite columns with the same name, so you get the same results in either case. There is only a single c1. So there's no need to change Kysely's types.

igalklebanov commented 1 year ago

I'm not concerned with select *, but with query context - can b.c1 be referenced in clauses after from (e.g. where) when using join using? Seems to be, at least in https://sqliteonline.com/.

Kysely seems to overwrite columns with the same name, so you get the same results in either case. There is only a single c1. So there's no need to change Kysely's types.

So if query context is the same, and there's no difference in result (in Kysely), this is all just syntactic sugar. As such, the importance of having this in the API is quite low.

dwickern commented 1 year ago

Yes, the query context is the same

EitanKatsightfull commented 10 months ago

Is there any update on the implementation of this ? @igalklebanov @koskimas

alexpmay commented 9 months ago

We use kysely extensively and love it. I just wanted to upvote this request as it make joins more concise. We use the using syntax extensively in raw sql, it would be great to have it in kysely too.

dwickern commented 9 months ago

I have an open PR for this in #862

kysely-org / kysely

Support `JOIN ... USING ( col1, col2, ... )` #730