DOCS: Add example how to use xhistogram to split a dataset into regions with a mask?

jbusecke commented 2 years ago

This is based on a brilliant suggestion by @dcherian, which I have used in multiple projects.

I think it would be very nice to document this little 'trick' to efficiently compute a histogram in 'regions' of an array which are encoded in a numbered mask (this is often how e.g. ocean basins are masked in climate models). Instead of masking out one region at a time with something like this:

for num_region in range(4):
    da_masked = da.where(da.mask==num_region)
    histogram(da_masked, ...)

One can simply add the mask as a second input variable and with appropriate bins, the results will be segmented into the appropriate regions (saving n-1 computations steps; where n is the number of regions).

I wrote up a quick example:

import numpy as np
import xarray as xr
from xhistogram.xarray import histogram

# make a quadrant mask
mask = np.concatenate([
    np.concatenate([np.full((2,2), 1), np.full((2,2), 2)]),
    np.concatenate([np.full((2,2), 3), np.full((2,2), 4)]),
],axis=1)
# create data that has similar values but with some added noise
data = (np.random.rand(*mask.shape)-0.5)*0.5+mask

da = xr.DataArray(data, dims=['x','y'], name='data', coords={'mask':xr.DataArray(mask, dims=['x','y'], name='mask')})
da

data_bins = np.linspace(0,5, 50)
mask_bins = np.arange(0.5, 5.5)

hist = histogram(da, da.mask, bins=[data_bins, mask_bins])
hist.plot(col='mask_bin')

You can see that this successfully segemented the regions and computed the histogram for each (the data is slightly spread around the actual mask number).

I think this really is a brilliant application case for so many science applications, but it is not obvious untils one tries it (at least it wasnt for me until @dcherian suggested it). So I think it would make a great example for the docs?

Id be happy to add this if ppl think it provides value. Maybe @shanicetbailey is interested in collaborating on this, since this might be very helpful in her current project?

dcherian commented 2 years ago

jbusecke commented 2 years ago

THIS will obviously be used!

xgcm / xhistogram

DOCS: Add example how to use xhistogram to split a dataset into regions with a mask? #76