Closed tomwhite closed 5 months ago
For context, the blocker here is some sort of internal support for variable chunking in Cubed? And presumably some rechunking to regular chunking will be needed at the end?
Yes, that what I had been thinking. However, now I think it should be possible to choose the rechunk boundaries when resampling so that each output chunk has the same number of groups. For the example shown in https://flox.readthedocs.io/en/latest/implementation.html#method-blockwise, the output would have two groups per chunk, rather than (2, 2, 3, 1) groups in each chunk. (It's OK if the last chunk has fewer groups.) There is slightly more data transferred this way, but it avoids a final rechunk, which avoids a whole dataset copy, so I think it's worth a try.
The other way to think of this then is that you want cohorts
with equal sized cohorts (except for the last one)
For context, the blocker here is some sort of internal support for variable chunking in Cubed? And presumably some rechunking to regular chunking will be needed at the end?