xarray-contrib / flox

Fast & furious GroupBy operations for dask.array
https://flox.readthedocs.io
Apache License 2.0
123 stars 16 forks source link

Support cumsum, cumprod #91

Open dcherian opened 2 years ago

dcherian commented 2 years ago

Supporting just numpy should be relatively easy. This will also work for method="blockwise" by default.

We may want to rename groupby_reduce to groupby_agg?

For dask proper, we'll need to use dask.array.cumreduction instead of dask.array.blockwise + dask.array.reductions._tree_reduce

Illviljan commented 1 year ago

I tried looking into this a while ago but I got stuck, because I found no examples of an aggregation where the shape stays the same. If you have more guidelines/ideas where to look it would be appreciated.

dcherian commented 1 year ago

Great to hear. Warning: This is going to be quite complicated :)

Here's how dask implements cumsum: https://docs.dask.org/en/stable/_modules/dask/array/reductions.html#cumsum

We'll need something like that with custom binop and merge.

I would try to get method="sequential" working first.

I would also try really hard to just reuse the cumreduction building block if we can. The annoyance is that we will need to propagate array and group_idx so something like https://github.com/xarray-contrib/flox/blob/0d353ec14c79c4c5123623f00555843324041b37/flox/aggregations.py#L359 should be helpful.

dcherian commented 1 year ago

Ooooh I forgot to mention, just getting the pure numpy version to work would be a great step forward :) We can always start there.

dcherian commented 1 month ago

Done in #370 but still needs the following

  1. [ ] ~Xarray interface~ better just use Xarray instead?
  2. [ ] handling binning / expected_groups -- perhaps we punt till we make a better API with Xarray Grouper objects?
  3. [ ] testing with multiple by
  4. [ ] setting method
  5. [ ] setting engine (perhaps not)