scikit-hep / uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
BSD 3-Clause "New" or "Revised" License
21 stars 28 forks source link

Introduction of TH3 #42

Closed marinang closed 5 years ago

marinang commented 5 years ago

Creation of TH3.py as discussed in #8. Modification of TH2.py, values was shaped as (ynumbins + 1 , xnumbins + 1) whereas it should be (xnumbins + 1 , ynumbins + 1) (right ?).

jpivarski commented 5 years ago

Some users have been using TH2s; have they been looking at diagonally flipped plots? We have to somehow warn them before changing their interface.

jpivarski commented 5 years ago

@nsmith- made some edits and @alasfar-lina raised some issues about TH2 in the past.

This pull request, in addition to adding TH3 (thanks!) swaps the x and y axes for TH2, presumably because they were wrong before. Unit tests don't go all the way to plotting, so that's entirely possible. Do you two find this to be the case, and would you be okay with a breaking API that fixes it?

nsmith- commented 5 years ago

Interesting, we did notice some oddities in the ordering, but I think ROOT is to blame for it being backwards. I will confirm today

nsmith- commented 5 years ago

Creating a 2D hist:

import ROOT

f = ROOT.TFile("test.root", "recreate")
h = ROOT.TH2D("hist", "test;x axis;y axis", 3, 0, 3, 4, 0, 4)
h.Fill(1,2)
h.Write()

and reading with uproot:

import uproot

f = uproot.open("test.root")
h = f['hist']

print("Edges:", h.edges)
print("Values:", h.values)
print("Values at (y,x) = (2,1):", h.values[2,1])
print("X title:", h._fXaxis._fTitle)

prints:

Edges: (array([0., 1., 2., 3.]), array([0., 1., 2., 3., 4.]))
Values: [[0. 0. 0.]
 [0. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]]
Values at (y,x) = (2,1): 1.0
X title: b'x axis'

So the h.values array is the correct shape for (y,x) ordering. The array h.values.reshape(3,4) is invalid. Hence there is an inconsistency between the tuple h.edges and the 2D array h.values. Personally I'm in favor of h.values.T rather than yedges, xedges = h.edges, but either way a breaking change should be made.

marinang commented 5 years ago

That's odd. What about TH3 then is it (z, y, x) ? I will have a look.

nsmith- commented 5 years ago

I think so, recalling how tree->Draw("z:y:x >> hist") works... ROOT does strange things sometimes :)

marinang commented 5 years ago

I have just checked the result of you example, and the bin with 1.0 is now an y overflow bin ...

>>> np.array(h[:]).reshape(5, 6)
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
>>> h.yoverflows
array([0., 0., 1., 0., 0.])

I will make some adjustments to get the correct ordering from ROOT and provide a (x, y, (z)) ordering in values and edges (ect...). Actually your example could be added as a test.

jpivarski commented 5 years ago

I put [WIP] in the name to prevent this from being accidentally merged. When you're certain that you've put things in the right order, remove the [WIP] and I'll accept whatever conclusions you've come to.

About (contents, xbins, ybins) vs (contents, [xbins, ybins]), I implemented one because it matches what numpy.histogram2d returns. However, the other is what numpy.histogramdd returns (with a nested list, not type, I believe). If Numpy is deprecating or at least downplaying numpy.histogram2d—I get that impression from some docs, but I'm not sure—then perhaps we should follow the d-dimensional signature.

I just hate switching again it on unsuspecting users. It was already the cause of one big report, and who knows how many people it affected without writing bug reports? If we're referring the order of the axes, that's just as bad and we should take this opportunity to put it in its final state.

marinang commented 5 years ago

Ok as for now the order of the axes is (x, y, (z)) for TH2 and TH3. You are talking about the numpymethod right ? I think the output formats should be identical for both TH2 and TH3 and I personally prefer this (contents, [xbins, ybins, zbins]) one.

jpivarski commented 5 years ago

Consistency between TH2 and TH3 is a good argument; it could be why Numpy is quietly hiding numpy.histogram2d (not listed in "see also" from numpy.histogram; only the d-dimensional case is listed).

jpivarski commented 5 years ago

@alasfar-lina, are you okay with this change? It would essentially revert what you saw in #36. If I could reach everybody I thought was using this, I would try, but the issues are all I have to go with.

(An announcement on the front page probably wouldn't do any good—it would affect people's workflows who have already downloaded uproot and set up their scripts. They're not going to be looking there.)

jpivarski commented 5 years ago

Is this done? Ready to be merged?

marinang commented 5 years ago

No I was waiting for an answer from @alasfar-lina. I can apply the changes now and tell when it's ready.

marinang commented 5 years ago

@jpivarski done!

lgray commented 5 years ago

I'm glad I see these PRs now. Need to update https://github.com/CoffeaTeam/fnal-column-analysis-tools.

jpivarski commented 5 years ago

It just went into PyPI as uproot-methods 0.4.3 a minute ago.