trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.28k stars 2.96k forks source link

Grouping sets fails on struct columns with dot notation #22604

Open cploonker opened 3 months ago

cploonker commented 3 months ago

Following query works

SELECT pr.name as name, MAX(pr.age) as age, MAX(city) as city
FROM (
  SELECT CAST(pr AS ROW(name VARCHAR, age int)) as pr, city
  FROM(
    VALUES 
      (ROW('John Doe', 30), 'New York'),
      (ROW('Jane Doe', 25), 'Los Angeles')
  ) AS t(pr, city)
)
GROUP BY pr.name

However following query fails

SELECT pr.name as name, MAX(pr.age) as age, MAX(city) as city
FROM (
  SELECT CAST(pr AS ROW(name VARCHAR, age int)) as pr, city
  FROM(
    VALUES 
      (ROW('John Doe', 30), 'New York'),
      (ROW('Jane Doe', 25), 'Los Angeles')
  ) AS t(pr, city)
)
GROUP BY GROUPING SETS (pr.name)

Error

trino error: TrinoUserError(type=USER_ERROR, name=INVALID_COLUMN_REFERENCE, message="line 11:26: GROUP BY expression must be a column reference: pr.name

Shouldn't both work?

cploonker commented 2 months ago

Presto has the same issue: https://github.com/prestodb/presto/issues/23151

hashhar commented 2 months ago

cc: @martint

martint commented 1 month ago

Per the SQL specification:

Each <grouping column reference> shall unambiguously reference a column of the table resulting from the <from clause>

The way the GROUPING SETS, ROLLUP and CUBE clauses are implemented follow the specification to the letter. Grouping by an expression (in the case of a vanilla GROUP BY) is an extension to the specification, which predates the implementation of those more advance grouping set qualifiers. We could consider adding the same extensions to those constructs, but there are some non-trivial implications for how grouping() and other operations such as non-deterministic functions would work, and that would need to be sorted out.

cploonker commented 1 month ago

Is a struct field not considered a column?

martint commented 1 month ago

No, that’s considered a “complex expression” (a field dereference expression)

cploonker commented 1 month ago

Got it. Thanks for the explaination. So this is a case of SQL standard coming in the way of ease of use.