scikit-hep / hist

Histogramming for analysis powered by boost-histogram
https://hist.readthedocs.io
BSD 3-Clause "New" or "Revised" License
123 stars 23 forks source link

[DOCS] Syntactic shortcut for dumping histogram centers/counts #364

Open kratsg opened 2 years ago

kratsg commented 2 years ago

Describe the problem, if any, that your feature request is related to

It would be nice to generate a "table" of the values shown in some particular histogram. Consider for example, a 1-dimensional histogram:

h = hist.Hist(hist.axis.IntCategory([], growth=True))
for array in uproot.iterate(...):
    h.fill(array["value"])

It would be nice to dump a dictionary directly where the keys are the category label and the values are the counts. Since list(h.axes[0]) already gives you the categories (which is not necessarily obviously), one would expect

dict(h)

to JustWork"™. The closest equivalent I've found is to do

>>> dict(zip(h.axes[0], h.values()))
{3: 248352.0, 2: 208653.0, 1000024: 1994.0, 1000014: 38226.0, 1000012: 59.0, 1000011: 3.0}

but that's not necessarily as nice.

Describe the feature you'd like

I would expect dict(h) to work. And for multi-dimensional arrays, the keys could be hashable tuples instead.

Describe alternatives, if any, you've considered

See above.

henryiii commented 2 years ago

We generally do not change API for 1D and ND histograms. This would introduce such a change - dict(h) would need to produce {(3,): 248352.0, ... and so forth. Also, this usually would produce tuples of floats (all axes types except Integer, IntCategory, and StrCategory), which are really bad for hashing.

dict(zip(h.axes[0], h.values())) is explicit, and not that bad.

eduardo-rodrigues commented 2 years ago

This is an example that I would typically recommend to provide somewhere in the docs/notebooks ... My 2 cents.