Open jacques-n opened 1 month ago
The argument descriptions feel similarly painful...
There are some arithmetic functions where the options do not apply to all implementations (fp addition does not specify overflow behavior for example).
I don't know if there are any implementations where the arguments differ based on the implementation.
What if argument / option descriptions were at the function level but each implementation still listed which options / arguments apply?
name: "round"
description: >
Rounding the value `x` to `s` decimal places.
args:
- name: "x"
description: >
Numerical expression to be rounded.
- name: "s"
description: >
Number of decimal places to be rounded to.
When `s` is a positive number, nothing will happen
since `x` is an integer value.
When `s` is a negative number, the rounding is
performed to the nearest multiple of `10^(-s)`.
options:
rounding:
description: >
When a boundary is computed to lie somewhere between two values,
and this value cannot be exactly represented, this specifies how
to round it.
- TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
to the even option.
- TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
halfway, tie away from zero.
- TRUNCATE: always round toward zero.
- CEILING: always round toward positive infinity.
- FLOOR: always round toward negative infinity.
- AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
- TIE_DOWN: round ties with FLOOR rule
- TIE_UP: round ties with CEILING rule
- TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
- TIE_TO_ODD: round to nearest value; if exactly halfway, tie
to the odd option.
impls:
- args:
- value: i8
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: i8?
- args:
- value: i16
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: i16?
- args:
- value: i32
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: i32?
- args:
- value: i64
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: i64?
- args:
- value: fp32
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: fp32?
- args:
- value: fp64
name: "x"
- value: i32
name: "s"
options:
rounding:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: fp64?
We could also leave the option's "values" at the function level (so each function just has an array of option names)
The only programmatic tool I know of that would be impacted is the BFT and I think it already reports options / arguments at the function level so there would be some benefit to align the two models.
I'm a fan of argument/option descriptions at the function level. We have a fun of other functions where all this information is put into the descriptions that's already at the function level, and it seems like it could be organized better.
Some of the string functions for example: https://github.com/substrait-io/substrait/blob/main/extensions/functions_string.yaml#L61
For the sake of discussion, an alternative would be to allow multiple types in the args' value
field, or generic types. Though that might require allowing to specify output type in terms of an input type. I.e. something like:
name: "round"
description: >
Rounding the value `x` to `s` decimal places.
impls:
- args:
- value: integer # alternative 1: specify a "logical" type
name: "x"
description: >
Numerical expression to be rounded.
- value: i32
name: "s"
description: >
Number of decimal places to be rounded to.
When `s` is a positive number, nothing will happen
since `x` is an integer value.
When `s` is a negative number, the rounding is
performed to the nearest multiple of `10^(-s)`.
options:
rounding:
description: >
When a boundary is computed to lie somewhere between two values,
and this value cannot be exactly represented, this specifies how
to round it.
- TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
to the even option.
- TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
halfway, tie away from zero.
- TRUNCATE: always round toward zero.
- CEILING: always round toward positive infinity.
- FLOOR: always round toward negative infinity.
- AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
- TIE_DOWN: round ties with FLOOR rule
- TIE_UP: round ties with CEILING rule
- TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
- TIE_TO_ODD: round to nearest value; if exactly halfway, tie
to the odd option.
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: arg_typeof("x")?
- args:
- value: # alternative 2: specify a list of types
- fp32
- fp64
name: "x"
description: >
Numerical expression to be rounded.
- value: i32
name: "s"
description: >
Number of decimal places to be rounded to.
When `s` is a positive number, the rounding
is performed to a `s` number of decimal places.
When `s` is a negative number, the rounding is
performed to the left side of the decimal point
as specified by `s`.
options:
rounding:
description: >
When a boundary is computed to lie somewhere between two values,
and this value cannot be exactly represented, this specifies how
to round it.
- TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
to the even option.
- TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
halfway, tie away from zero.
- TRUNCATE: always round toward zero.
- CEILING: always round toward positive infinity.
- FLOOR: always round toward negative infinity.
- AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
- TIE_DOWN: round ties with FLOOR rule
- TIE_UP: round ties with CEILING rule
- TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
- TIE_TO_ODD: round to nearest value; if exactly halfway, tie
to the odd option.
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: arg_typeof("x")?
If multiple args specify lists of types, then all combinations of those would be considered valid. This could be useful especially if the function takes two or more arguments and they don't necessarily need to be of the same type (though maybe then the output type gets complicated). The "logical" types could be e.g. integers (i8 .. i64), numerics (i8 .. i64, fp32, fp64, decimal?), strings (string, varchar, fixedchar).
The
rounding
blocks in this file: https://github.com/substrait-io/substrait/blob/main/extensions/functions_rounding.yamlMake my eyes hurt and destroy my DRY soul. How about we add support for options to be declared at the same level as function name so these patterns aren't this awful...
Thoughts @EpsilonPrime @vbarua @westonpace ?