scikit-hep / uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
BSD 3-Clause "New" or "Revised" License
21 stars 28 forks source link

Bug -> uproot_methods 0.3.2 does not read in two dimensional histograms correctly #31

Closed lgray closed 5 years ago

lgray commented 5 years ago

Using uproot 0.3.2 causes the binning of the pt axis for the histogram here to be incorrect: https://github.com/CoffeaTeam/fnal-column-analysis-tools/blob/master/tests/samples/testSF2d.histo.root

output in uproot 0.3.1 (correct by verifying in ROOT):

mac-129479:fnal-column-analysis-tools lagray$ pip install uproot-methods==0.3.1
Collecting uproot-methods==0.3.1
  Using cached https://files.pythonhosted.org/packages/f4/30/de9f8a9d7b380c5a9bcd9177836deccf18ddb9cb88533c8683e02af4bb66/uproot_methods-0.3.1-py2.py3-none-any.whl
Requirement already satisfied: awkward>=0.7.0 in /anaconda2/lib/python2.7/site-packages (from uproot-methods==0.3.1) (0.7.1)
Requirement already satisfied: numpy>=1.13.1 in /anaconda2/lib/python2.7/site-packages (from uproot-methods==0.3.1) (1.15.4)
Installing collected packages: uproot-methods
  Found existing installation: uproot-methods 0.3.2
    Uninstalling uproot-methods-0.3.2:
      Successfully uninstalled uproot-methods-0.3.2
Successfully installed uproot-methods-0.3.1
mac-129479:fnal-column-analysis-tools lagray$ python
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> hist = uproot.open('tests/samples/testSF2d.histo.root')['scalefactors_Tight_Electron;1']
>>> hist.numpy()
(array([[0.80655736, 0.82857144, 1.0328639 , 1.007752  , 0.94072163,
        0.9458763 , 0.9897698 , 1.0339806 , 0.8274854 , 0.79701495],
       [0.88245934, 0.9274874 , 1.007595  , 0.97203946, 0.9529984 ,
        0.9819967 , 0.9752066 , 0.9748744 , 0.90893763, 0.8633218 ],
       [0.9188406 , 0.9669876 , 0.9881956 , 0.97466666, 0.9534575 ,
        0.97981155, 0.97463286, 0.96638656, 0.95782316, 0.9078014 ],
       [0.9403973 , 0.9809645 , 0.9953917 , 0.97167486, 0.95308644,
        0.9775281 , 0.97893435, 0.9799073 , 0.9685535 , 0.93766236],
       [1.0510856 , 1.0059242 , 1.1041056 , 0.989486  , 0.97549593,
        1.0118906 , 1.010727  , 1.0071633 , 0.9882629 , 1.0213568 ],
       [1.0510856 , 1.0059242 , 1.1041056 , 0.989486  , 0.97549593,
        1.0118906 , 1.010727  , 1.0071633 , 0.9882629 , 1.0213568 ]],
      dtype=float32), (array([-2.5  , -2.   , -1.566, -1.444, -0.8  ,  0.   ,  0.8  ,  1.444,
        1.566,  2.   ,  2.5  ]), array([ 10.,  20.,  35.,  50.,  90., 150., 500.])))
>>> 

output in uproot 0.3.2 (second axis is totally wrong):

mac-129479:fnal-column-analysis-tools lagray$ pip install uproot-methods==0.3.2
Collecting uproot-methods==0.3.2
  Using cached https://files.pythonhosted.org/packages/d2/e4/8294ce0ead0ecc2b1d42bcb35409c8b9361c25139c1fc2e887494dd2f0f9/uproot_methods-0.3.2-py2.py3-none-any.whl
Requirement already satisfied: awkward>=0.7.0 in /anaconda2/lib/python2.7/site-packages (from uproot-methods==0.3.2) (0.7.1)
Requirement already satisfied: numpy>=1.13.1 in /anaconda2/lib/python2.7/site-packages (from uproot-methods==0.3.2) (1.15.4)
Installing collected packages: uproot-methods
  Found existing installation: uproot-methods 0.3.1
    Uninstalling uproot-methods-0.3.1:
      Successfully uninstalled uproot-methods-0.3.1
Successfully installed uproot-methods-0.3.2
mac-129479:fnal-column-analysis-tools lagray$ python
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> hist = uproot.open('tests/samples/testSF2d.histo.root')['scalefactors_Tight_Electron;1']
>>> hist.numpy()
(array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.        , 0.80655736, 0.82857144, 1.0328639 , 1.007752  ,
        0.94072163, 0.9458763 , 0.9897698 , 1.0339806 , 0.8274854 ,
        0.79701495, 0.        ],
       [0.        , 0.88245934, 0.9274874 , 1.007595  , 0.97203946,
        0.9529984 , 0.9819967 , 0.9752066 , 0.9748744 , 0.90893763,
        0.8633218 , 0.        ],
       [0.        , 0.9188406 , 0.9669876 , 0.9881956 , 0.97466666,
        0.9534575 , 0.97981155, 0.97463286, 0.96638656, 0.95782316,
        0.9078014 , 0.        ],
       [0.        , 0.9403973 , 0.9809645 , 0.9953917 , 0.97167486,
        0.95308644, 0.9775281 , 0.97893435, 0.9799073 , 0.9685535 ,
        0.93766236, 0.        ],
       [0.        , 1.0510856 , 1.0059242 , 1.1041056 , 0.989486  ,
        0.97549593, 1.0118906 , 1.010727  , 1.0071633 , 0.9882629 ,
        1.0213568 , 0.        ],
       [0.        , 1.0510856 , 1.0059242 , 1.1041056 , 0.989486  ,
        0.97549593, 1.0118906 , 1.010727  , 1.0071633 , 0.9882629 ,
        1.0213568 , 0.        ],
       [0.        , 1.0510856 , 1.0059242 , 1.1041056 , 0.989486  ,
        0.97549593, 1.0118906 , 1.010727  , 1.0071633 , 0.9882629 ,
        1.0213568 , 0.        ]], dtype=float32), array([  -inf, -2.5  , -2.   , -1.566, -1.444, -0.8  ,  0.   ,  0.8  ,
        1.444,  1.566,  2.   ,  2.5  ,    inf]), array([        -inf,  10.        ,  91.66666667, 173.33333333,
       255.        , 336.66666667, 418.33333333, 500.        ,
                inf]))
>>> 
nsmith- commented 5 years ago

Looks like it just includes the overflow bins now. Indeed there are h.values and h.allvalues to differentiate between including overflow, and likewise h.edges and h.alledges. Rather strange is the h.numpy now has two definitions (the latter taking precedence) In particular it seems this scale factor has meaningful content in the pt overflow, so we should accomodate that.

lgray commented 5 years ago

@nsmith- look at the pt axes, the bin low edges are rather different. The over flow bins in the histogram values are also missing.

lgray commented 5 years ago

Ah, ok, the histogram values make sense. The overflow is repeated. The pt-axis bin values are still messed up.

jpivarski commented 5 years ago

Is this still broken?

nsmith- commented 5 years ago

It appears to work as expected per the discussion in #32

jpivarski commented 5 years ago

Okay, good. Thanks.