Pull up expressions not using lambda variables from nested lambdas

kaikalur commented 8 months ago

We see some things like the following map_normalize in the code

transform_values(input, (k, v) -> (v / array_sum(map_values(input)));

note that the array_sum (which itself is a sql function using reduce) is invoked for every element of the map as currently we are not optimizing that. We should fix that for performance reasons. Here is a simple rewrite that can be done in a general fashion.

Create a fake singleton array with a struct/row of original inputs then also any pulled out expressions and run the original lambda on the first element substituting the pulled up lambdas as appropriate and just project out the result. So the above lambda becomes:

transform(array[input, array_sum(map_values(input))], --single element x->transform_values(x[1], (k,v)->v/x[2]))[1]

kaikalur commented 8 months ago

Of course if we can improve our lambda analyzer to do it natively in the optimizer/planner/codegen that's better but this should work as well.

kaikalur commented 8 months ago

CC: @feilong-liu

prestodb / presto

Pull up expressions not using lambda variables from nested lambdas #22214