Open erinov1 opened 1 year ago
I often miss the way that Pandas allows iteration over groups. This request would seem to be a step in the right direction.
A note for people running into the same issue, other than calling the .collect() function on the lazy frame to make it a frame and then grouping by that (it is not quite as speedy), you might be able to use partition_by() with the over() window function.
Problem description
Is there a fundamental obstruction to being able to consume a lazy groupby as an iterator? I often wish I could do something like
where
complicated_function
depends on the specific keys and some other lazyframes, but cannot be expressed in an agg context. (This works fine for eager DataFrames.)This is of course similar to a groupby-apply, but my understanding is that groupby-apply kills parallelism even when
complicated_function
involves only native polars functionality.