trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.46k stars 3.01k forks source link

Lambda aggregation with complex state #2089

Open findepi opened 4 years ago

findepi commented 4 years ago

Provide ability to define aggregation function composed of lambdas.

Like array reduce() but for aggregations.

I.e. like java.util.stream.Collector#of(java.util.function.Supplier<A>, java.util.function.BiConsumer<A,T>, java.util.function.BinaryOperator<A>, java.util.function.Function<A,R>, java.util.stream.Collector.Characteristics...).

electrum commented 4 years ago

We have https://prestosql.io/docs/current/functions/aggregate.html#reduce_agg

mrbrahman commented 1 year ago

It would be awesome to have this feature!

findepi commented 1 year ago

Indeed we have https://trino.io/docs/current/functions/aggregate.html#reduce_agg.

However it has a limitation:

The state type must be a boolean, integer, floating-point, or date/time/interval.

Sometimes, a reduce state needs to be composite of eg two numbers, so having support for row type would be good.

mrbrahman commented 1 year ago

I use the array reduce function a lot with row datatypes, and hence when I saw reduce_agg, my immediate thoughts were "great! now I can do lambda aggregate functions!". Without reading the full docs, I started writing reduce_agg with arrays and row datatypes.

... which obviously failed. It was only then I saw that last line

The state type must be a boolean, integer, floating-point, or date/time/interval.

... and I was like :-/

Supporting complex data types (i.e. rows, arrays) in reduce_agg similar to reduce would be a killer feature! Literally! As it would eliminate the need to write UDAF for not-so-complex-things (or may be even complex ones)