xgcm / xhistogram

Fast, flexible, label-aware histograms for numpy and xarray
https://xhistogram.readthedocs.io
MIT License
91 stars 20 forks source link

overlapping bins #24

Open miniufo opened 3 years ago

miniufo commented 3 years ago

xhistogram is great tool for one of my project that bins tropical cyclone (TC) records into gridded data as the frequency of occurrence. However, it is done using a non-overlapping bins so that the result may appear noisy when TCs are less frequent. I just wonder if there is an option of overlapping bins could be used, so the result of PDF could be smoothed somehow. This could obtain a relatively high-resolution result while keep more data in each bin to ensure statistical significance.

rabernat commented 3 years ago

I understand why you want to do this, in order to make your PDF smoother.

However, I'm not sure we want to allow overlapping bins, as this would introduce significant complexity and deviate from numpy histogram behavior.

You could try smoothing your histogram post calculation. Xarray's rolling window operations would be perfect for this. However, I'm not sure that smoothing addresses the fundamental problem. If your histogram is noisy, your bins are probably too fine. You should probably just use coarser bins.

miniufo commented 3 years ago

Hi Ryan, I've tried your suggestion that smooth the binned field using rolling operation. This is efficient but if bin size is large the result will spread too much and the contour becomes zigzag. Alternatively I've tried using the kernal estimates for a smoothed estimate in fine bins. Results are quite nice (see here) but it is extremely slow when there are many TC data.

So I just wonder if there is a way to use overlapping bins, which is a common request in binning Lagrangian observations like TCs, extratropical cyclones, drifter data and Argo floats (even tons of synthetic float data in models).

Since xhistogram is built on top of numpy, I understand that this is not straightforward unless numpy provides appropriated APIs. Just a remark here to see if someone could provide a better solution.

rabernat commented 3 years ago

I'm pretty sure that convolving your histogram with an appropriate kernel can be mathematically identical to using overlapping bins.