scikit-hep / vector

Vector classes and utilities
https://vector.readthedocs.io
BSD 3-Clause "New" or "Revised" License
77 stars 24 forks source link

Record type is not correct for downcasting awkward vectors #412

Closed nsmith- closed 7 months ago

nsmith- commented 7 months ago

Vector Version

1.1.1.post1

Python Version

3.10.12

OS / Environment

OS/x

Describe the bug

import awkward as ak
import vector.backends.awkward

a = ak.zip(
    {
        "x": [10.0, 20.0, 30.0],
        "y": [-10.0, 20.0, 30.0],
        "z": [5.0, 10.0, 15.0],
        "t": [16.0, 31.0, 46.0],
    },
    with_name="Momentum4D",
    behavior=vector.backends.awkward.behavior,
)
b = ak.zip(
    {
        "x": [-10.0, 20.0, -30.0],
        "y": [-10.0, -20.0, 30.0],
        "z": [5.0, -10.0, 15.0],
    },
    with_name="Momentum3D",
    behavior=vector.backends.awkward.behavior,
)
c = ak.zip(
    {"x": [-10.0, 13.0, 15.0], "y": [12.0, -4.0, 41.0]},
    with_name="Momentum2D",
    behavior=vector.backends.awkward.behavior,
)

print("a+b")
print(a+b)
print(ak.parameters(a+b))
print("b+a")
print(b+a)
print(ak.parameters(b+a))

returns

a+b
[{x: 0, y: -20, z: 10}, {x: 40, y: 0, z: 0}, {x: 0, y: 60, z: 30}]
{'__record__': 'Momentum4D'}
b+a
[{x: 0, y: -20, z: 10}, {x: 40, y: 0, z: 0}, {x: 0, y: 60, z: 30}]
{'__record__': 'Momentum3D'}

So, although the fields available are the ones expected, the record type is incorrect in the first case. Similar issues are present for a+c and b+c.

Any additional but relevant log output

No response

nsmith- commented 7 months ago

One significant consequence is that checking equality a+c == c+a raises a type error:

TypeError: <MomentumArray4D [{x: 0, y: 2}, {...}, {x: 45, ...}] type='3 * Momentum4D[x...'> and <MomentumArray2D [{x: 0, y: 2}, {...}, {x: 45, ...}] type='3 * Momentum2D[x...'> do not have the same dimension
Saransh-cpp commented 7 months ago

Thanks for reporting this! This definitely is a bug, but we should decide what behavior we consider a bug. For instance, the other 2 backends preserve the t field while performing the calculations -

In [1] import vector

In [2]: vector.obj(x=1, y=2, z=3, t=4) + vector.obj(x=1, y=2, z=3)
Out[2]: VectorObject4D(x=2, y=4, z=6, t=4)

In [3]: vector.obj(x=1, y=2, z=3, t=4) - vector.obj(x=1, y=2, z=3)
Out[3]: VectorObject4D(x=0, y=0, z=0, t=4)

In [4]: - vector.obj(x=1, y=2, z=3, t=4) + vector.obj(x=1, y=2, z=3)
Out[4]: VectorObject4D(x=0, y=0, z=0, t=-4)

In [5]: v = vector.VectorNumpy4D(
    ...:     {
    ...:         "x": [1.1, 1.2, 1.3, 1.4, 1.5],
    ...:         "y": [2.1, 2.2, 2.3, 2.4, 2.5],
    ...:         "z": [3.1, 3.2, 3.3, 3.4, 3.5],
    ...:         "t": [4.1, 4.2, 4.3, 4.4, 4.5],
    ...:     }
    ...: )

In [6]: v2 = vector.VectorNumpy3D(
    ...:     {
    ...:         "x": [1.1, 1.2, 1.3, 1.4, 1.5],
    ...:         "y": [2.1, 2.2, 2.3, 2.4, 2.5],
    ...:         "z": [3.1, 3.2, 3.3, 3.4, 3.5],
    ...:     }
    ...: )

In [7]: v - v2
Out[7]: 
VectorNumpy4D([(0., 0., 0., 4.1), (0., 0., 0., 4.2), (0., 0., 0., 4.3),
               (0., 0., 0., 4.4), (0., 0., 0., 4.5)],
              dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('t', '<f8')])

In [8]: v2 -v
Out[8]: 
VectorNumpy3D([(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.),
               (0., 0., 0.)], dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8')])

In [9]: v - v2
Out[9]: 
VectorNumpy4D([(0., 0., 0., 4.1), (0., 0., 0., 4.2), (0., 0., 0., 4.3),
               (0., 0., 0., 4.4), (0., 0., 0., 4.5)],
              dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('t', '<f8')])

In [10]: - v + v2
Out[10]: 
VectorNumpy4D([(0., 0., 0., -4.1), (0., 0., 0., -4.2), (0., 0., 0., -4.3),
               (0., 0., 0., -4.4), (0., 0., 0., -4.5)],
              dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('t', '<f8')])

I think the bug here is that the t field is missing from Momentum4D vectors in your example. Changing the nature of promotion / demotion of vectors should maybe be a new feature?

Tagging @jpivarski here for further discussion.

nsmith- commented 7 months ago

It's more than that, the other backends also have the same infix operator issue:

>>> vector.obj(x=1, y=2, z=3, t=4) + vector.obj(x=1, y=2, z=3)
VectorObject4D(x=2, y=4, z=6, t=4)
>>> vector.obj(x=1, y=2, z=3) + vector.obj(x=1, y=2, z=3, t=4)
VectorObject3D(x=2, y=4, z=6)

you get a different type depending on which is first.

nsmith- commented 7 months ago

My preference is to not have t in either output in the above example, i.e. both should return VectorObject3D(x=2, y=4, z=6)

jpivarski commented 7 months ago

I also think that it should drop t in all backends.

Namely, if an operation takes more than one vector as input and would return a vector of the same type as output (e.g. T + T → T), but the two inputs differ in type (e.g. T1 + T2), then the output should be the more restrictive of the two (upcasting).