hyrise / sql-parser

SQL Parser for C++. Building C++ object structure from SQL statements.
MIT License
733 stars 243 forks source link

Correctly parse named columns for joins #240

Closed dey4ss closed 3 months ago

dey4ss commented 4 months ago

This PR allows a list of identifiers (a.k.a column names) when joins name columns with USING (...). This behavior is in line with the SQL standard and PostgreSQL. Furthermore, we add a flag if join conditions come from specifying them directly (foo JOIN bar ON (...)) or from named columns (foo JOIN bar USING (...)) so the database can adjust the emitted columns addording to the SQL standard (emit both columns for each predicate in ON, emit a single column for each column in USING).

For details, see #239.

Fixes #239.

Bouncner commented 4 months ago

What is your take on the “no column references” limitation?

dey4ss commented 4 months ago

First of all, it is in line withe the standard and pgSQL, which we try to mostly comply to. Second, it also makes sense IMO. foo JOIN bar USING (...) is comparable to foo NATURAL JOIN bar (the latter is equivalent to the first when the USING list contains all columns with the same name in foo and barUSING is a NATURAL join where you explicitly specify or limit the used columns). All columns in the list have to be present in both foo and bar, so a column reference explicitly referencing a column on only one side is not really useful.

dey4ss commented 4 months ago

Instead of parsing the list of columns as conjunctions of equality predicates and adding an extra boolean flag to JoinDefinition, we could also simply have the vector of column names as member, where either this vector or the join condition would be set:

struct JoinDefinition {
  JoinDefinition();
  virtual ~JoinDefinition();
  TableRef* left;
  TableRef* right;
  Expr* condition;                   // set for `foo JOIN bar ON foo.x = bar.x`
  std::vector<char*>* namedColumns;  // set for `foo JOIN bar USING (x)`
                                     // both unset for `foo NATURAL JOIN bar`
  JoinType type;
};

This would be a bit more explicit and leaves resolving the column names to a predicate to the DBMS. However, it differs from the way we handled USING before, and I am not sure what we want to do here in terms of backwards compatibility/divergence from forks.

Bouncner commented 3 months ago

Feel free to zünd.