We see some things like the following map_normalize in the code
transform_values(input, (k, v) -> (v / array_sum(map_values(input)));
note that the array_sum (which itself is a sql function using reduce) is invoked for every element of the map as currently we are not optimizing that. We should fix that for performance reasons. Here is a simple rewrite that can be done in a general fashion.
Create a fake singleton array with a struct/row of original inputs then also any pulled out expressions and run the original lambda on the first element substituting the pulled up lambdas as appropriate and just project out the result. So the above lambda becomes:
transform(array[input, array_sum(map_values(input))], --single element
x->transform_values(x[1], (k,v)->v/x[2]))[1]
We see some things like the following map_normalize in the code
transform_values(input, (k, v) -> (v / array_sum(map_values(input)));
note that the array_sum (which itself is a sql function using reduce) is invoked for every element of the map as currently we are not optimizing that. We should fix that for performance reasons. Here is a simple rewrite that can be done in a general fashion.
Create a fake singleton array with a struct/row of original inputs then also any pulled out expressions and run the original lambda on the first element substituting the pulled up lambdas as appropriate and just project out the result. So the above lambda becomes:
transform(array[input, array_sum(map_values(input))], --single element x->transform_values(x[1], (k,v)->v/x[2]))[1]