Open NJManganelli opened 2 months ago
A reproducer upstream in boost::histogram is beyond my time availability, but I'd expect it's equally true there, so I'm posting this bug report here as much for user-visibility as technical lack of a strict reproducer there.
Other principle libraries used are a mix of development versions corresponding mostly to the latest git version from approximately a month ago, and can be provided if necessary, but given the reproduction across different environments, it seems to be a relatively 'stable' bug
@NJManganelli Just for developer time savings, can you provide the minimal install list required to reproduce this? (So the high level requirements.txt
that you would pip install
from, not the output of pip list
.)
The core installs are as follows:
hist
uproot5
coffea
mplhep
awkward
dask-awkward
Describe the bug
When an IntCategory axis with growth=False and overflow=False is filled with values that fall outside the defined bins (which should be a valid combination, and is useful in certain contexts to 'ignore' invalid values without explicit masking of them), the result can be 0 for the entire histogram, at least when a StrCategory axis and Weight storage are used. This directly appears with using named versions of these axes types in scikit-hep/hist, and produces a stranger error in dask_histogram.
Steps to reproduce
This also includes the reproducer for hist and dask_histogram, to demonstrate some of the discrepancies
This produces the following output, where the unmasked fill into the histogram without an overflow on the IntCategory produces 0 sum for (boost-)hist; and a junk-like value in DaskHistogram (presumably the partitioned fill plays an additional role here)
This bug was reproduced in 1.4.1 of boost-histogram, on an M1 Max chip, using MacOS 14.4; it was originally discovered on AlmaLinux8 (on Fermilab's LPC cluster) using a coffea container. Other principle libraries used are a mix of development versions corresponding mostly to the latest git version from approximately a month ago, and can be provided if necessary, but given the reproduction across different environments, it seems to be a relatively 'stable' bug