pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Add `shuffle` kwarg to `GroupBy.map` #9706

Open dcherian opened 2 weeks ago

dcherian commented 2 weeks ago

When shuffle=True, we call .shuffle() and then apply the UDF using map_blocks. This turns out to be a bit involved:

  1. Constructing template is not trivial
  2. map_blocks requires that any new dimension that is added must be of the same size in all blocks. This does not work for e.g. groupby('label').mean() where the result has a new dimension label that may be chunked in the output.

TODO: