scikit-hep / vector

Vector classes and utilities
https://vector.readthedocs.io
BSD 3-Clause "New" or "Revised" License
81 stars 27 forks source link

Apparently rounded values in Awkward Arrays, but not in individual objects #337

Closed kreczko closed 1 year ago

kreczko commented 1 year ago

Vector Version

1.0.0

Python Version

Python 3.10.4

OS / Environment

Ubuntu 22.04.2 LTS

Describe the bug

There is a discrepancy (or I am doing something very wrong) when calculating the mass of a 4-vector on arrays vs individual objects.

Given the example below, I get the results:

The type for the vector approach looks correct: type='2 * var * float64'>

Example

import awkward as ak
import vector

data = dict(
    px=ak.Array([[10., 20., 30.], [50., 60.]]),
    py=ak.Array([[20., 30., 40.], [60., 70.]]),
    pz=ak.Array([[30., 40., 50.], [70., 80.]]),
)

mass = 90.0
data["energy"] = np.sqrt(data["px"] ** 2 + data["py"] ** 2 + data["pz"] ** 2 + mass ** 2)

p4 = vector.zip(
    {
        "px": data["px"],
        "py": data["py"],
        "pz": data["pz"],
        "energy": data["energy"],
    }
)

combinations = ak.combinations(p4, 2, fields=["p1", "p2"])
masses = (combinations["p1"] + combinations["p2"]).mass # same as .tau
print(f"{masses=}")

def invariant_mass(combinations: ak.Array):
    for event in combinations:
        for combination in event:
            total = np.sum(ak.unzip(combination))
            yield total.mass

print(list(invariant_mass(combinations)))

EDIT: Additional example

This provides same output as the invariant_mass function (in case there are questions about np use):

import math
v1 = vector.obj(px=10.0, py=20.0, pz=30.0, E=math.sqrt(10.0 ** 2 + 20.0 ** 2 + 30.0 ** 2 + 90.0 ** 2))
v2 = vector.obj(px=20.0, py=30.0, pz=40.0, E=math.sqrt(20.0 ** 2 + 30.0 ** 2 + 40.0 ** 2 + 90.0 ** 2))
v3 = vector.obj(px=30.0, py=40.0, pz=50.0, E=math.sqrt(30.0 ** 2 + 40.0 ** 2 + 50.0 ** 2 + 90.0 ** 2))
v4 = vector.obj(px=50.0, py=60.0, pz=70.0, E=math.sqrt(50.0 ** 2 + 60.0 ** 2 + 70.0 ** 2 + 90.0 ** 2))
v5 = vector.obj(px=60.0, py=70.0, pz=80.0, E=math.sqrt(60.0 ** 2 + 70.0 ** 2 + 80.0 ** 2 + 90.0 ** 2))

print([(v1 + v2).mass, (v1 + v3).mass, (v2 + v3).mass, (v4 + v5).mass])

Any additional but relevant log output

Example code should output

masses=<Array [[181, 183, 181], [180]] type='2 * var * float64'>
[180.67940751580096, 182.51419683759175, 180.57777240589775, 180.330167894841]
jpivarski commented 1 year ago

The data are there, with full precision, but ak.Array.__str__ minimizes the printed resolution to save screen space for all of the brackets and record key names that may be in complex data types.

>>> import numpy as np
>>> import awkward as ak
>>> import vector
>>> data = dict(
...     px=ak.Array([[10., 20., 30.], [50., 60.]]),
...     py=ak.Array([[20., 30., 40.], [60., 70.]]),
...     pz=ak.Array([[30., 40., 50.], [70., 80.]]),
... )
>>> mass = 90.0
>>> data["energy"] = np.sqrt(data["px"] ** 2 + data["py"] ** 2 + data["pz"] ** 2 + mass ** 2)
>>> p4 = vector.zip(
...     {
...         "px": data["px"],
...         "py": data["py"],
...         "pz": data["pz"],
...         "energy": data["energy"],
...     }
... )
>>> combinations = ak.combinations(p4, 2, fields=["p1", "p2"])
>>> masses = (combinations["p1"] + combinations["p2"]).mass # same as .tau

So

>>> masses
<Array [[181, 183, 181], [180]] type='2 * var * float64'>

but

>>> masses.to_list()
[[180.67940751580096, 182.51419683759175, 180.57777240589775], [180.330167894841]]

or even

>>> masses - 180
<Array [[0.679, 2.51, 0.578], [0.33]] type='2 * var * float64'>

NumPy also truncates by default, but not as much.

>>> ak.to_numpy(ak.fill_none(ak.pad_none(masses, 3), 0))
array([[180.67940752, 182.51419684, 180.57777241],
       [180.33016789,   0.        ,   0.        ]])

Between Awkward Array, NumPy, and Python, only Python gives you the full precision so that if you copy the string representation, you can make a new object with the same precision (exactly the same). Actually, this is sometimes considered a problem because newcomers to programming can be confused by

>>> 0.1 + 0.1 + 0.1
0.30000000000000004

(which you don't see with np.array([[0.1, 0.1, 0.1]]).sum(axis=1) or ak.sum([[0.1, 0.1, 0.1]], axis=1)).