scikit-hep / boost-histogram

Python bindings for the C++14 Boost::Histogram library
https://boost-histogram.readthedocs.io
BSD 3-Clause "New" or "Revised" License
143 stars 21 forks source link

Inconsistent filling of values on bin edges [BUG] #752

Open Dominic-Stafford opened 2 years ago

Dominic-Stafford commented 2 years ago

When creating a Regular axis with an integer step size, integers that fall on the bin edges aren't consistently assigned to the same side of the bin edges:

>>> hist=bh.Histogram(bh.axis.Regular(100, 0, 200))
>>> hist.fill([56, 58, 60])
Histogram(Regular(100, 0, 200), storage=Double()) # Sum: 3.0
>>> hist.values()    
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

I guess this is due to some sort of numerical precision issue, and one can see a similar behaviour with the edges, though this is clearly not exactly the same as from these one would expect neither of 56 or 58 to fall in the 28th bin, when in fact both do:

>>> bh.axis.Regular(100, 0, 200).edges[28]
56.00000000000001
>>> bh.axis.Regular(100, 0, 200).edges[29]
57.99999999999999

If it's difficult to fix this issue for regular axes, would it be possible to have a step option on integer axes? I observed the same problem for bh.axis.Regular(100, 0, 100), but there one can just use bh.axis.Integer(0, 100) instead, which doesn't have this problem

henryiii commented 2 years ago

This is discussed in boostorg/histogram somewhere, such as https://github.com/boostorg/histogram/issues/336. It's really an issue there, rather than here. The algorithm for computing edges has different numerical precision issues than the one for computing the fill, which is why they can be off. The Regular axis is optimized for performance.