partiql / partiql-lang

The PartiQL language specification
https://partiql.org/partiql-lang
Other
8 stars 1 forks source link

Define `SELECT *` behavior #19

Open alancai98 opened 1 year ago

alancai98 commented 1 year ago

This issue is meant to track SELECT * behaviors that aren't currently captured in the spec.

The partiql-lang-kotlin reference implementation defines a transform of SELECT * to a SELECT list with path wildcard expressions and column references. The SelectStarVisitorTransform performs the rewrite. Example rewrites:

-- Single `FROM` source
SELECT * 
FROM foo AS f AT idx
-- transformed to below
SELECT "f".*, "idx" AS idx 
FROM foo AS f AT idx
-- JOIN
SELECT * 
FROM foo AS f, bar AS b
-- transformed to below
SELECT "f".*, "b".* 
FROM foo AS f, bar AS b
-- GROUP BY (performs GROUP BY alias transformation beforehand)
SELECT * 
FROM foo AS f 
GROUP BY 
    a AS a, 
    b AS b, 
    c AS c 
GROUP AS g
-- transformed to below
SELECT 
    "${'$'}__partiql__group_by_1_item_0" AS a, 
    "${'$'}__partiql__group_by_1_item_1" AS b, 
    "${'$'}__partiql__group_by_1_item_2" AS c, 
    "g" AS g 
FROM foo AS f 
GROUP BY 
    a AS a, 
    b AS b, 
    c AS c 
GROUP AS g
  1. From the above, seems like the AS alias is rewritten to path wildcard expressions while AT aliases are rewritten as just a column reference? Is this as simple as a syntactic rewrite? Can this be defined in terms of binding tuples operations similar to how other clauses are defined (e.g. FROM, SELECT VALUE, GROUP BY, etc.)?

  2. In the GROUP BY example does it make sense to support SELECT * when we already specify the GROUP BY keys to form the groups? SQL92 doesn't specify whether this is permissible. PostgreSQL permits SELECT * with GROUP BY so long as all the columns are included in the GROUP BY key list:

CREATE TABLE TestT (
    nn int,
    ss varchar(255)
);

INSERT INTO TestT VALUES(1,'foo') ;
INSERT INTO TestT VALUES(2,'bar') ;
INSERT INTO TestT VALUES(3,'baz') ;

SELECT * FROM TestT GROUP BY nn, ss;
-- Above is permissible and outputs
nn | ss
-- | --
1 | foo
3 | baz
2 | bar

While omitting one of the columns gives an error

SELECT * FROM TestT GROUP BY nn;

Outputs

Query Error: error: column "testt.ss" must appear in the GROUP BY clause or be used in an aggregate function

MySQL has the same behavior and outputs a similar error

Query Error: Error: ER_WRONG_FIELD_WITH_GROUP: Expression partiql/partiql-spec#2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.TestT.ss' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Interestingly, SQLite and MariaDB permit not specifying all the columns in the GROUP BY

SELECT * FROM TestT GROUP BY nn

Outputs:

nn | ss
-- | --
1 | foo
3 | baz
2 | bar
  1. How should SELECT * define the tuple merging with respect to the ordering (and similarly when schema is and isn't defined)? E.g. for
SELECT * FROM a, b -- rewritten to `SELECT a.*, b.* FROM a AS a, b as b`

with a ordered and b unordered, will the output binding tuple be ordered or unordered. If ordered, in what order will the tuple fields be ordered.

  1. Should LET variables be included in the SELECT *? Presently, the reference Kotlin implementation does not include the LET variables in SELECT *:
...
Using version: 0.9.3-SNAPSHOT-c4f061b0
PartiQL> SELECT * FROM <<{'a': 0}>> AS x LET 2 AS y;
==='
<<
  {
    'a': 0
  }
>>
---
OK!

Viewing LET as a way to create new variable bindings (similar to joining on a singleton), it would make sense for such bindings to be included in the output of SELECT *.


References: