substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.16k stars 150 forks source link

specify output type derivation rules for SetRel #558

Open vbarua opened 11 months ago

vbarua commented 11 months ago

The spec documents the direct output order of a SetRel as

The field order of the inputs. All inputs must have identical fields.

Having producers align the input types for set relations makes sense, however as written this also means they must align the nullability of the inputs, which is a bit more painful.

For example, the following inputs are aligned with regards to types, but not nullability.

Input 1: (I64, I64, I64?, I64?)
Input 2: (I64, I64?, I64, I64?)

Proposal

We should loosen the input restriction to allow for inputs where all columns are the same type but with potentially differing nullbilities.

Along with this, we can provide the following derivation rules for output types:

UNION

If any of the input columns is null, the corresponding column of the output is null. A union of the inputs above would yield

Output: (I64, I64?, I64?, I64?)

INTERSECTION

If any of the input columns is not null, the corresponding column of the output is not null. An intersection of the inputs above would yield

Output: (I64, I64, I64, I64?)

MINUS

The output type is the type of the first input. A minus of the inputs above would yield

Output: (I64, I64, I64?, I64?)
jacques-n commented 8 months ago

I think your proposal makes sense.