weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 258 forks source link

Aggregation functions for Grizzly Series #509

Closed sppalkia closed 4 years ago

sppalkia commented 4 years ago

This PR adds aggregation function support for Grizzly Series. The API follows that of Pandas, with the exception of the index of the returned value: Pandas assigns labels based on the aggregation function, while Grizzly (which does not support custom indexes at the moment) assigns numerical labels.

Examples of aggregations with Grizzly

>>> s = GrizzlySeries([1,2,3,4])
>>> s.agg('sum')
10
>>> s.agg(['sum', 'mean']).evaluate()
0    10.0
1     2.5
dtype: float64

Multiple aggregations are co-optimized to prevent redundant computation, e.g., a requesting a variance and a mean will only compute the mean once and use the result for variance computation.

Currently supports:

Some of the IR code is taken from https://www.github.com/weld-project/baloo.