substrait-io / substrait-java

Apache License 2.0
75 stars 72 forks source link

Calcite Aggregates Built w/ Out Of Order Grouping Keys Have Incorrect Layout #155

Closed vbarua closed 2 months ago

vbarua commented 1 year ago

Code Example

public class SubstraitRelNodeConverterTest extends PlanTestBase {
      @Test
      void groupingKeyOrdering() {
          Plan.Root root = b.root(
          b.aggregate(input -> b.grouping(input, 2, 0), input -> List.of(),
                  b.namedScan(List.of("foo"), List.of("a", "b", "c"), List.of(R.I64, R.I64, R.STRING))));
          var relNode = converter.convert(root.getInput());
          // Fails because the output record has shape (a, c) because Calcite "forgets" the grouping key order
          assertRowMatch(relNode.getRowType(), R.STRING, R.I64);
      }
}

Explanation

When build an aggregate using the RelBuilder, Calcite converts the grouping keys into an immutable bit set and then uses the order of the keys in the bit set to generate the output record shape. That means that if you hand it a grouping key like in the example of (2, 0), Calcite will output the row as (0, 2) (which the rest of the Substrait plan is not necessarily expecting).

This behaviour in Calcite appears to be somewhat expected, as the Calcite Sql Parser appears to apply projections before aggregations to account for this behaviour.

vbarua commented 2 months ago

Fixed by https://github.com/substrait-io/substrait-java/pull/214