Closed vbarua closed 1 year ago
The Substrait functions MIN, MAX, AVG and SUM ALWAYS return a nullable version of the input type (per the spec). In most cases, the default Calcite definitions of these functions behave the same way, EXCEPT in some cases where they infer a NOT NULL
return type.
This results in errors like:
type mismatch:
aggCall type:
TINYINT
inferred type:
TINYINT NOT NULL
This PR addresses these issues by creating variants of Calcite function definitions with less clever return type inference.
This also fixes a similar issue with SUM0 which always returns a NOT NULL I64 per the spec, but the Calcite definition returns a NOT NULL version of the input numeric type.
This results in errors like:
type mismatch:
aggCall type:
BIGINT NOT NULL
inferred type:
TINYINT NOT NULL
The changes for the above comment, which fixed Substrait to Calcite conversions, broke Calcite to Substrait conversions.
Effectively, in a query like
SELECT MAX(column) FROM foo
Calcite would parse the MAX function into its standard SqlMinMaxAggFunction
representation instead of the newer SubstraitSqlMinMaxAggFunction
. This caused the Calcite to Substrait conversion to fail because AGGREGATE_SIGS
contained a mapping from AggregateFunctions.MAX
to "max"
, and not one from SqlStdOperatorTable.MAX
.
The solution I chose for this was to introduce a SubstraitOperatorTable
that preferentially selects Substrait function variants when they are available.
lgtm
fix(calcite): address type inference mismatch in arithmetic aggregate functions
feat: util methods on TypeCreator for setting type nullability feat: builder utils for arithmetic aggregate functions feat(calcite): support sum0 sql function test: static fields in Type interface caused NPEs in tests
BREAKING CHANGE: Isthmus no longer uses Calcite built-in MAX, MIN, SUM, SUM0 and AVG functions BREAKING CHANGE: removed REQUIRED and NULLABLE fields from Type interface