scikit-hep / aghast

Aghast: aggregated, histogram-like statistics, sharable as Flatbuffers.
BSD 3-Clause "New" or "Revised" License
17 stars 8 forks source link

Stability of histogram types #42

Closed jonas-eschle closed 4 years ago

jonas-eschle commented 4 years ago

Hi all, thanks a lot for this repo!

What is the status of the stability for the binning types? We are interested to use them as the binning definitions in zfit. Or is there a subset that you would consider "stable", especially the simpler ones?

jpivarski commented 4 years ago

Aghast is not in active development, so in that sense, it's absolutely stable. If it were in active development, I'd want to reshape the interface, so in that sense, it is not stable.

The key thing came in March 2019 (!), @nsmith-'s observation that all binning being rectilinear is a problem for ever supporting sparse data, which are essentially jagged arrays (#10). It got me thinking that the bins shouldn't be a NumPy array, but an Awkward Array. At the time, that was a non-starter because Awkward Array had to be rebuilt as Awkward 1 (August 2019 through April 2020).

This year, @henryiii and @LovelyBuggies developed quite a lot of hist; if the official 2.0 release isn't out yet, it will be very soon. They did some Aghast integration (e.g. scikit-hep/aghast#39) and we had many conversations about it, in which I made it clear that I'm unable to maintain Aghast, let alone give it the essential upgrade it needs to future-proof it for sparse histograms.

A great idea that came out of those conversations was to backpedal this somewhat and introduce a histogram protocol, rather than a universal format, using Python typing. A protocol is an API that histogram libraries can adhere to (not necessarily exclusively) and histogram-using libraries can expect (as a minimum). If that protocol is expressed in Python types, then it is an interface that only histogram libraries in Python can share. Aghast was more ambitious; it was intended to be an ABI, a block of bytes that can be interpreted as a histogram between processes and across languages, but that may be more than we need. If at least one histogram library that shares the API has a good serialization, they all effectively inherit it.

I haven't been able to find a link to those conversations; I've been searching GitHub commits and issues, but I don't know where we talked about it.

henryiii commented 4 years ago

Discussions for the API https://github.com/scikit-hep/boost-histogram/issues/423, https://github.com/scikit-hep/uproot/issues/511 are not progressing very well, see https://github.com/scikit-hep/boost-histogram/issues/459.

jonas-eschle commented 4 years ago

Many thanks for the answer, it's quite insightful!

So in total, we will stick with the development and axes types closer to hist (knowing that it is under development, but we don't need a lot of it either).