xarray-contrib / flox

Fast & furious GroupBy operations for dask.array
https://flox.readthedocs.io
Apache License 2.0
124 stars 17 forks source link

Consider `preferred_method="blockwise"` if `by` is sorted #359

Open dcherian opened 6 months ago

dcherian commented 6 months ago

I'm thinking of groupby("time.year"), which looks like a resample, and also resampling type aggregations,

In general, if by is sorted, then blockwise is potentially the best option.

We could gate this on

  1. 1D by
  2. ???

I still prefer the idea of using preferred_chunks on xarray's new Grouper objects to rechunk it intentionally but a heuristic might be effective in the intermediate.

method="blockwise" automatically rechunks, so this would be step towards the above.

dcherian commented 6 months ago

With size-1 chunks, e.g.|0|0|0|1|1|1|1, I'm not sure there's any benefit. It would be a regression for when num_blocks_per_groups > split_every

Note that using blockwise is still correct since we rechunk so that all group members are in a single block.