Closed otsaloma closed 2 years ago
Having function factories for common operations could allow a speed up by using Numba under the hood. If DataFrame.aggregate
recognizes these special functions, it could make a single call instead of the current [function(x) for x in slices]
thus placing the loop over the groups in the Numba code.
Currently we do
With a lot of calculated columns, that gets a bit verbose with all the lambdas.
Maybe we could add helpers to shorten the lambdas in common cases?, e.g.
Or, use a single lambda with a complex return value similar to Pandas'
apply
? Looks nice with a lot of columns, but really bad if only needing one column, such as in current notation.aggregate(n=di.nrow)
.