xgcm / xhistogram

Fast, flexible, label-aware histograms for numpy and xarray
https://xhistogram.readthedocs.io
MIT License
90 stars 20 forks source link

Refactor to use dask.array.blockwise #56

Closed TomNicholas closed 3 years ago

TomNicholas commented 3 years ago

I've started the refactoring to use dask.array.blockwise, beginning with what @rabernat did in #49 .

So far all I've done is:

I have not done:

I'm not totally sure if I'm understanding the proposed algorithm correctly - in the numpy code path then the bincounts.sum(axis) will only ever sum over length-1 axes, is that correct?

I'm also wondering how much of the _bincount_kernel code can or should just be copied directly from numpy/dask.histogramdd (with attribution of course)... Can we just loop over the "unused_inds" of the array with the histogramdd algorithm to preserve the dimensions we don't wish to reduce over? Perhaps using a generalized ufunc? Or would that be inefficient?

cc @dougiesquire @gjoseph92