substrait-io / substrait-java

Apache License 2.0
75 stars 71 forks source link

[ISTHMUS] [TPCDS] Support Grouping #71

Open hbutani opened 2 years ago

hbutani commented 2 years ago

For example:

Queries in tpcds having this issue: [27, 36, 86]

How to reproduce: Add a test to TpcdsQueryNoValidation like this:

@ParameterizedTest
  @ValueSource(ints = {27})
  public void tpcdsQ27(int query) throws Exception {
    testQuery(query);
  }

On fixing, please move the fixed queries from the tpcdsFailure list to the tpcdsSuccess list

jacques-n commented 2 years ago

In https://github.com/substrait-io/substrait/pull/240, @guykhazma suggested that this should be considered a function. I'm not convinced that is the case. @hbutani , thoughts?

hbutani commented 2 years ago

My two cents: nice to have separate expression types for these like Spark; but I cannot find a reason for them not to be functions.

julianhyde commented 2 years ago

The standard defines GROUPING and GROUPING_ID, which are pretty much synonyms. Oracle also defines GROUP_ID.

To a first approximation, you can treat them as aggregate functions (in that you can only use them in a GROUP BY query); their value is constant for the group (i.e. what will become a record emitted from the GROUP BY) but independent of the actual rows that go into the group.

I think it's overkill to treat them as a special expression type. Aggregate functions will suffice.

julianhyde commented 2 years ago

ROLLUP (along with CUBE, and GROUP BY DISTINCT) is not really a function but a macro that gives you a shorthand for writing grouping sets. I recommend just supporting GROUPING SETS.

mbasmanova commented 2 years ago

ROLLUP (along with CUBE, and GROUP BY DISTINCT) is not really a function but a macro that gives you a shorthand for writing grouping sets. I recommend just supporting GROUPING SETS.

+1. There is a nice explanation of how ROLLUP and CUBE are just shortcuts for GROUPING SETS in the Presto documentation: https://prestodb.io/docs/current/sql/select.html#group-by-clause