scikit-hep / uproot3-methods

Pythonic behaviors for non-I/O related ROOT classes.
BSD 3-Clause "New" or "Revised" License
21 stars 28 forks source link

TLorentzVector ValueError in __repr__ #10

Closed shane-breeze closed 5 years ago

shane-breeze commented 5 years ago

Using from_ptetaphim to create a TLorentzVector if you pass a large value for eta then pz and e will be np.inf because of the exponential behaviour of sinh with eta. This isn't too much of an issue but if you try to print the TLorentzVector the __repr__ function gives a ValueError because for format code 'g' is unknown for np.inf.

TVector3 should suffer with a similar issue.

Error:

File "uproot_methods/classes/TLorentzVector.py", line 336, in __repr__
    return "TLorentzVector({0:.5g}, {1:.5g}, {2:.5g}, {3:.5g})".format(self._fP._fX, self._fP._fY, self._fP._fZ, self._fE)
ValueError: Unknown format code 'g' for object of type 'str'
jpivarski commented 5 years ago

No, there's something weirder going on: it's not complaining that numpy.inf can't be formatted by "g", it's complaining that a string (str) can't be formatted by "g".

I'm pretty sure that numpy.inf can be formatted with "g", though it's worth testing. (numpy.nan, too.) However, we've got to figure out why the infinity turns into a string.

Are you persisting it (pickle or something) and then reconstituting it? JSON doesn't accept numpy.inf and numpy.nan, so I turn these into strings for JSON— and hopefully turn them back upon deserialization. I don't see how that applies here because only the metadata gets JSON'ed when picking, etc., not the values in the arrays.

Hmmm.

shane-breeze commented 5 years ago

hmm...true. Just checked and np.inf and np.nan work fine with the 'g' formatter.

I think I understand the cause a bit better now. I can reproduce the error with the following code:

import uproot_methods as um
import awkward as awk

pt = awk.JaggedArray.fromiter([[1.0]])
eta = awk.JaggedArray.fromiter([[1.0]])
phi = awk.JaggedArray.fromiter([[1.0]])
mass = awk.JaggedArray.fromiter([[1.0]])
p4 = um.TLorentzVectorArray.from_ptetaphim(pt, eta, phi, mass)
print(p4)

The issue is passing the JaggedArrays (or even nested lists). First of all, am I using this as intended? The goal is to take the 4 jagged arrays (pt, eta, phi, mass - this is how they're stored in the TTree) and create a jagged array of TLorentzVectors.

jpivarski commented 5 years ago

I didn't foresee passing JaggedArrays into that constructor, but that means it needs either a better error message or maybe this case should be handled outright.

What's probably happening is that the JaggedArray is being converted into a Numpy object array (because some intermediate step requires a Numpy array) and then there's some error displaying that object array. We should be avoiding the conversation into Numpy object arrays as much as possible— it's a performance bottleneck that makes indexing features fail.

The way this was intended to work, you'd pass flat content arrays into the constructor and then make those the content of a JaggedArray by using the starts and stops of the old array. You probably want that new JaggedArray to have Lorentz vector methods like .mass, etc., so you'd have to construct a JaggedArray with Lorentz methods type

JaggedArrayWithTLorentzVector = awkward.Methods.mixin(uproot_methods.classes.TLorentzVector.ArrayMethods, JaggedArray)
a = JaggedArrayWithTLorentzVector(starts, stops, newcontent)

(to the best of my memory— the arguments could be backward).

This interface wasn't meant for end-user physicists, but a few other people have hit it, too. We'll need some high-level interface to make this sort of thing easy.

jpivarski commented 5 years ago

Fixed by PR #11.