xgcm / xhistogram

Fast, flexible, label-aware histograms for numpy and xarray
https://xhistogram.readthedocs.io
MIT License
89 stars 19 forks source link

time-varying bins #67

Open miniufo opened 3 years ago

miniufo commented 3 years ago

Thanks to you guys for this handy package.

I had to use time-varying bins when I tried to calculate the area enclosed by tracer contours, as the tracer changes with time. However, the bins kwarg neither support xarray nor support a mulidimendional numpy array for hist along contour space.

I also need to convert PDF of xhist to CDF for area. It seems that xhist does not directly support this. A way to do this is to use numpy.cumsum. Is it possible to integrate this functionality into xhist?

jbusecke commented 3 years ago

This would only be necessary if you want to change the tracer bins with time, I think. To just count the area within e.g. tracer<=phi_0 you just want to pass fixed tracer bins and compute the histogram over x/y (not time).

miniufo commented 3 years ago

I see there is a PR #59 by @TomNicholas. Is this able to solve this?

TomNicholas commented 3 years ago

Hi there @miniufo , glad you are finding this package useful.

I had to use time-varying bins

@jbusecke is right that changes of the data with time don't necessarily mean you need time-varying bins, but for actual time-varying bins then yes #59 was intended to solve that. However, as per this comment I will probably wait until after integration into xarray to implement it, so it might be a little while.

I also need to convert PDF of xhist to CDF for area. It seems that xhist does not directly support this. A way to do this is to use numpy.cumsum. Is it possible to integrate this functionality into xhist?

xarray has a .cumsum() method for DataArrays, can you not just directly call that on the output of xhistogram? For non-uniform bins there is also .integrate().

miniufo commented 3 years ago

Thanks for your gus @jbusecke @TomNicholas.

My tracer extrema vary with time. I would like to ensure the max (min) value corresponds to total (zero) area of the domain. If the bin is fixed, I may get non-monotonic tracer-area relation (which I need to reverse later). Normalization between max and min is a workaround. But I chose to loop over time for time-varying bins and then concat then along time. Not complicate for my special case, but not easy to generalize to account for many cases.

Oooops, didn't realize the xarray's cumsum(). I'll fix this, paying attension to the non-uniform bins.