substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.14k stars 148 forks source link

fix: remove function definitions w/ invalid return types #599

Closed vbarua closed 6 months ago

vbarua commented 6 months ago

T? and T|? are not valid return types

T|? is equivalent to setting the nullability to MIRROR T&? is not currently expressible via nullability

BREAKING CHANGE: least_skip_null and greatest_skip_null functions have been removed

richtia commented 6 months ago

Here's the PR that introduced those for reference: https://github.com/substrait-io/substrait/pull/247

vbarua commented 6 months ago

@richtia thanks for linking that 🙇

Additional context from @westonpace in Slack

No. Let's back those out for now and create a todo. We had some discussion on this topic in the community meeting. The conclusion was that we do not want to add new syntax to the standard return types. Instead, those should use the same "return type is a small block of code that you have to evaluate" approach that is used for decimal arithemtic, e.g.

return: |-
init_scale = max(S1,S2)
init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
min_scale = min(init_scale, 6)
delta = init_prec - 38
prec = min(init_prec, 38)
scale_after_borrow = max(init_scale - delta, min_scale)
scale = init_prec > 38 ? scale_after_borrow : init_scale
DECIMAL<prec, scale>

Part of the discussion in Slack was also around adding a new nullability type like INTERSECTION for returns non-null if any of the input arguments is non-null as comes up in a number of places (i.e. coalesce).

vbarua commented 6 months ago

However, would it be better to keep the methods around and just change the return type to T? Is "slightly inaccurate but usable" better than "accurate but not usable"?

@westonpace I think it makes sense to keep them available so that they can be used. In this case, using nullability MIRROR is weaker than the actual nullability desired so it seems safe.

Made a change to bring them back in, update the return types and include a note about it.

I've created https://github.com/substrait-io/substrait/issues/601 as an issue to track this new desired nullability as well.

westonpace commented 6 months ago

If @EpsilonPrime is still good with the updated version I think we are good to go

vbarua commented 6 months ago

Will merge this as this has 3 SMC votes (including myself)