substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.11k stars 147 forks source link

A "let" expression #415

Open gatesn opened 1 year ago

gatesn commented 1 year ago

I believe this has been brought up a couple of times before as part of #287 and somewhat in #320, but I thought it would be worth discussing explicitly.

The idea is whether it should be possible to reference common subexpressions. Either using an explicit "let" expression operator, e.g. let x = y + z; and(gt(a, x), lte(b, x)) or by pulling them to the plan-level as is done with the relation reference operator.

julianhyde commented 1 year ago

Calcite's RexProgram class is a bundle of project (and optionally filter) expressions. The expressions are topologically sorted so that common expressions are computed first. It might be a useful concept for substrait to borrow.

westonpace commented 1 year ago

@julianhyde is RexProgram itself an expression?

julianhyde commented 1 year ago

is RexProgram itself an expression?

Not exactly. It is a collection of expressions and a filter. If you squint you could regard it as an expression that returns an optional tuple.

But I do think it solves the requirement for common subexpressions admirably.

An ordinary "let" expression, without the ability to return tuples, is going to struggle with cases like this (in Standard ML-like pseudocode), with multiple outputs based on the same common subexpressions:

   let
     val x = y + z
   in
     { b = a > x andalso b <= x, p = a + x, q = p + z }
   end

and this one, where output is conditional on an intermediate expression:

   let
     val x = y + z
   in
     if a > x andalso b <= x then
       emit { p = a + x,  q = p + z }
  end
westonpace commented 1 year ago

@julianhyde If it isn't an expression then how do we use it? For example, a project is currently defined as:

message ProjectRel {
  RelCommon common = 1;
  Rel input = 2;
  repeated Expression expressions = 3;
  substrait.extensions.AdvancedExtension advanced_extension = 10;
}

Should we change field 3 to a repeated Program message?