scikit-hep / hist

Histogramming for analysis powered by boost-histogram
https://hist.readthedocs.io
BSD 3-Clause "New" or "Revised" License
127 stars 25 forks source link

[BUG] Error message when category slicing info isn't present in the axies #260

Closed gordonwatts closed 2 years ago

gordonwatts commented 3 years ago

Describe the bug

If you create a category axis, fill, and then slice it using the list operator when you ask for a category that doesn't exist the error message is rather hard to interpret. Should be improved.

Steps to reproduce

This notebook shows how this works with hist version 2.4.0. A error message might be more like "x, y, and z, are valid in this category axis" or similar.

LovelyBuggies commented 3 years ago

FYI, there's at least a UserWarning if we run it in hist 2.4.0. Here's what we got in my python console, and the thing goes the same on the Jupyter notebook.

>>> import hist
>>> hist.__version__
'2.4.0'
>>> from hist import Hist
>>> 
>>> mass_hist = (Hist.new
...              .Reg(60, 60, 180, name='mass', label='$m_{4\ell}$ [GeV]')
...              .StrCat([], name='dataset', label='Cut Type', growth=True)
...              .StrCat([], name='channel', label='Channel', growth=True)
...              .Int64()
...             )
>>> mass_hist.fill(
...     mass=140.0,
...     dataset='data1',
...     channel='eemumu'
... )
Hist(
  Regular(60, 60, 180, name='mass', label='$m_{4\\ell}$ [GeV]'),
  StrCategory(['data1'], growth=True, name='dataset', label='Cut Type'),
  StrCategory(['eemumu'], growth=True, name='channel', label='Channel'),
  storage=Int64()) # Sum: 1.0
>>> mass_hist[:,['data1'],:]
/Users/ninolau/anaconda3/envs/hist/lib/python3.9/site-packages/boost_histogram/_internal/hist.py:806: UserWarning: List indexing selection is experimental. Removed bins are not placed in overflow.
  warnings.warn(
Hist(
  Regular(60, 60, 180, name='mass', label='$m_{4\\ell}$ [GeV]'),
  StrCategory(['data1'], growth=True, name='dataset', label='Cut Type'),
  StrCategory(['eemumu'], growth=True, name='channel', label='Channel'),
  storage=Int64()) # Sum: 1.0

Btw, @henryiii this seems like a boost-histogram issue, since we directly pass the indices to BH (though we could add a wrapper in hist to check it).

henryiii commented 3 years ago

The warning is from boost-histogram and is correct - if you select, the "unselected" bins do not get placed in the overflow bin, which I think is the correct behavior. I think the bug here is if you were to call mass_hist[:,["I-don't-exist"],:] the error message is not clear. Though I think that might be a boost-histogram fix, as it's bh.loc that's computing this.

henryiii commented 3 years ago

I think this is https://github.com/scikit-hep/boost-histogram/issues/387

gordonwatts commented 3 years ago

Yes - I always meant this to be a comment on the error message - the behavior (a crash) is just fine - I just wanted an error message that would actually point me in the right direction as to the error I'd made. It is a small thing in that sense. Thanks so much for following it up!

henryiii commented 2 years ago

I believe this was fixed in boost-histogram a while ago. Reopen if not!