TLorentzVectorArray yields different values depending on masking order

mverzett commented 4 years ago

Hi again! Apologies for bothering you way too often. This time I am quite positive that something strange is going on, but bear with me if it is indeed the intended behaviour:

Both prompt_mu and second_mu are coffea's JaggedCandidateArrays with only one element in each event, therefore flattening makes sense. The issue appears rarely (< 0.1%) and does not seem linked to coffea's classes, I will try to make a simple set-up to reproduce the issue.

(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk])[0,0].mass
0.1089482379549958
(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk]).mass
<JaggedArray [[nan]] at 0x7f7724f16f10>
(Pdb) (skim.prompt_mu.p4 + skim.second_mu.p4).mass[msk]
<JaggedArray [[nan]] at 0x7f7724d6c390>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten()).mass[msk]
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten()[msk] + skim.second_mu.p4.flatten()[msk]).mass
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk].mass
array([nan])
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk][0].mass
0.1089482379549958
(Pdb) (skim.prompt_mu.p4[msk] + skim.second_mu.p4[msk])
<JaggedArrayMethods [[TLorentzVector(x=-99.746, y=118.34, z=-767.31, t=782.76)]] at 0x7f7724d660d0>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten())[msk]
<TLorentzVectorArray [TLorentzVector(x=-99.746, y=118.34, z=-767.31, t=782.76)] at 0x7f7724d6c650>
(Pdb) (skim.prompt_mu.p4.flatten() + skim.second_mu.p4.flatten()).mass[msk][0]
nan

Set-up:

awkward: 0.12.20
uproot: 3.11.2
uproot-methods: 0.7.3

jpivarski commented 4 years ago

It's not surprising that some mass calculations return nan, especially if they're rare. The mass is a square root of a subtraction of two potentially large numbers, so a small fractional error in energy and momentum can make the argument of the square root slightly negative, and then you get nan.

Is the odd thing that whether you get nan or not depends on what order you apply the [0], [msk], and .mass? That does sound wrong: numerical round-off is one thing, but the same numbers should be applied to each other in every case—you should either always get nan or never for different ways of expressing the same quantity in different ways. If there's some sort of off-by-one error in which energy elements are applied to which momentum elements, that's a big issue and needs to be addressed.

Is it possible to make a reproducible example that doesn't require the original data files? Like, can you create Lorentz values that show this issue using fromiter or by explicitly building JaggedArrays? It looks like you've isolated the particular vectors that show the issue.

Even when the mass isn't nan, it's suspiciously close to 1× the muon mass. That's weird. Even if the muons are nearly collinear, you'd get 2× the muon mass...

mverzett commented 4 years ago

Hi @jpivarski , thanks for looking into it. I tried to export back to ROOT (both float32 and 64), but that washes away the issue. Indeed it seems is a numerical rounding plus something more.

I isolated the problematic vectors here. There are two vectors, v1 and v2, each of three elements. If summed they yield the issue in the second element. I left it like this in case the issue actually starts in the sum, and kept other two values just for reference.

Thanks again!

import awkward
ff = awkward.load('test.awkd')
(ff['v1'] + ff['v2']).mass
# array([87.1220124 ,         nan, 65.57995346])
(ff['v1'] + ff['v2'])[1].mass
# 0.1089482379549958

jpivarski commented 4 years ago

Okay, first off: it's not a scary off-by-one error. Awkward/Uproot-Methods is not adding the wrong numbers together or anything like that.

It is a round-off error, and the reason you see a difference is because f12[1].mass creates a Python object with Python float types as x/y/z/t attributes and f12.mass[1] performs a operation on the NumPy arrays before extracting value [1]. Python float types are 64-bit. Your NumPy arrays happen to be 32-bit. NumPy preserves the 32-bit precision through the calculation, and this makes the difference between a small mass, resulting from the subtraction of large energy and momentum, and an imaginary mass, which np.sqrt returns as nan.

See below—I'm removing the np.sqrt part of the calculation for clarity:

>>> v12 = (ff['v1'] + ff['v2'])
>>> v12.t[1]**2 - v12.x[1]**2 - v12.y[1]**2 - v12.z[1]**2
0.011869718553498387
>>> (v12.t**2 - v12.x**2 - v12.y**2 - v12.z**2)[1]
-0.008600915200076997

When the np.sqrt is applied, the first case returns a small value and the second returns nan.

@henryiii This is a lesson for the Vector library going forward: single-Lorentz vector objects should have the same storage as arrays of Lorentz vectors. Their x/y/z/t values should probably be NumPy scalar views of the arrays they came from, and hopefully that will ensure that the same precision of operations is performed on single vectors as arrays of vectors.

Maybe Awkward1 needs to change: currently, it returns Python values if you select one element of an array. I kindof hate that feature of NumPy, though, because its scalars are not JSON-encodable, and you end up staring at the screen wondering why it can't turn the number 12 into JSON (because NumPy scalars get printed to the screen the same way as Python numbers).

Alternatively, we can have a policy of doing all our math in 64-bit precision. It's 2020 and that's not slower, even on most GPUs. 32-bit floats are important for data storage/packing on disk, but errors accumulate in a calculation. If you want to move this issue over to vector as a question of policy, that'd be good.

mverzett commented 4 years ago

@jpivarski thanks for digging into that! Indeed I was mostly scared by a possible off-by-one or hidden function state failure, that's reassuring. My two cents would be indeed to move to 64bits for calculation.

Thanks again!

scikit-hep / awkward-0.x

TLorentzVectorArray yields different values depending on masking order #238