scikit-hep / vector

Vector classes and utilities
https://vector.readthedocs.io
BSD 3-Clause "New" or "Revised" License
77 stars 24 forks source link

Revisiting vector dimension projecting/embedding #422

Closed jpivarski closed 6 months ago

jpivarski commented 7 months ago

We had another conversation today that reopens issue #412 (implemented by PR #413). When adding/subtracting vectors of different spatial/momentum dimensions—i.e. 2D vs 3D vs 4D, not array shape—there are three possibilities:

  1. the low-D vector gets promoted to the high-D dimensionality, filling the missing slots with zeros (so that they don't contribute to the sum)
  2. the high-D vector gets demoted to the low-D dimensionality, ignoring the information in its extra dimensions
  3. it's a TypeError that explains how to promote or demote in the error message.

@Saransh-cpp, @nsmith-, @iasonkrom, @lgray, and @martindurant were involved in that discussion. Nick restated his opinion in #412 (option 2), but it seemed that the majority were in favor of option 3, as long as it's easy to promote/demote.

Since spatial/momentum dimensionality is expressed as record fields, high-D vectors are subtypes of low-D vectors (high-D vectors have a superset of the fields that a low-D vector has), and high-D → low-D is upcasting (removing those extra fields from the high-D vector). Upcasting requires no extra information, but downcasting requires extra information: the values of the new fields that are being added (though we'd make them 0, in a way that is neutral for addition and subtraction).

There's no ambiguity about which dimensions are dropped or added when going between low-D and high-D, since our vectors have a special 3rd component (cylindrically symmetric, not spherically symmetric) and a special 4th component: 2D is always azimuthal, the 3rd dimension in 3D is always longitudinal, and the 4th dimension in 4D is always temporal.

If we're going with option 3, the methods that project high-D → low-D and the methods that embed low-D → high-D have to be short and not clutter up a mathematical expression (like NumPy's .T, just two characters to be just-visible-enough). It would be nice to have a single word for this, instead of "project" and "embed," although those are the correct mathematical terms.

We could have .to_2D(), .to_3D(), .to_4D(), since we don't have an ambiguity about which dimensions are being added or removed. These would be shorter synonyms of functions that already exist to do this.

Iason suggested .like(other_array), so that we don't need to specify which "D". I like that and would +1 that. (If you want to specify which "D", there are already functions for that; the new convenience is being able to fit it to a given other_array.) The error message in option 3 could be

TypeError: cannot add or subtract vectors of different dimensionality; use

    a.like(b) + b

or

    a + b.like(a)

to project or embed one of the vectors to match the other's dimensionality

This also ties into CoffeaTeam/coffea#991, since the same thing needs to be implemented in Coffea, to smooth the transition from Coffea vectors to Vector vectors.

ikrommyd commented 7 months ago

I generally like to know what my code is doing when doing analysis and I'm in favor of raising a TypeError when vectors of different dimnsionality are attempted to be added together. That way the user has better knowledge of which dimensions the operations are going to affect and also avoids mistakently adding vectors together thinking that they are of the same dimension while in fact they are not. I assume this is going to be the general case for all element-wise operations between vectors. I also stand by the opinion that .like(other_arrray) is a nice method to have.

ikrommyd commented 7 months ago

I would also like to add that .like(other_array) is important for upscaling the dimensionality. Suppose you have a 3d force vector and a 2d force vector that is on xy plane and your question is "what is the total force on some object?". You should be able to convert the 2d vector to a 3d one and add them together and get the total force.

Saransh-cpp commented 7 months ago

168 #382 tie up nicely with this issue. I'm planning to resolve them first and then pick up this issue.

Saransh-cpp commented 7 months ago

Regarding combining multiple dimensions with multiple backends, is the following the ideal behavior or should this error out too -

In [1]: import vector

In [2]: from vector import VectorObject2D

In [3]: numpy_vec = vector.array(
   ...:     {
   ...:         "x": [1.1, 1.2, 1.3, 1.4, 1.5],
   ...:         "y": [2.1, 2.2, 2.3, 2.4, 2.5],
   ...:         "z": [3.1, 3.2, 3.3, 3.4, 3.5],
   ...:     }
   ...: )
   ...: awkward_vec = vector.zip(
   ...:     {
   ...:         "x": [1.0, 2.0, 3.0],
   ...:         "y": [-1.0, 2.0, 3.0],
   ...:         "z": [5.0, 10.0, 15.0],
   ...:         "t": [16.0, 31.0, 46.0],
   ...:     },
   ...: )
   ...: object_vec = VectorObject2D.from_xy(1.0, 1.0)

In [4]: numpy_vec + object_vec
Out[4]: 
VectorNumpy2D([(2.1, 3.1), (2.2, 3.2), (2.3, 3.3), (2.4, 3.4), (2.5, 3.5)],
              dtype=[('x', '<f8'), ('y', '<f8')])

In [5]: awkward_vec + object_vec
Out[5]: <VectorArray2D [{x: 2, y: 0}, {...}, {x: 4, y: 4}] type='3 * Vector2D[x: fl...'>
Saransh-cpp commented 7 months ago

Is this the ideal behavior:

In [1]: import vector

In [2]: numpy_vec = vector.array(
   ...:     {
   ...:         "x": [1.1, 1.2, 1.3, 1.4, 1.5],
   ...:         "y": [2.1, 2.2, 2.3, 2.4, 2.5],
   ...:         "z": [3.1, 3.2, 3.3, 3.4, 3.5],
   ...:     }
   ...: )
   ...: awkward_vec = vector.zip(
   ...:     {
   ...:         "x": [1.0, 2.0, 3.0],
   ...:         "y": [-1.0, 2.0, 3.0],
   ...:         "z": [5.0, 10.0, 15.0],
   ...: #        "t": [16.0, 31.0, 46.0],
   ...:     },
   ...: )
   ...: object_vec = vector.VectorObject2D.from_xy(1.0, 1.0)

In [3]: awkward_vec + object_vec
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-910a69f6ee22> in <cell line: 0>()
----> 1 awkward_vec + object_vec
...

In [4]: awkward_vec + object_vec.like(awkward_vec)
Out[4]: <VectorArray3D [{x: 2, y: 0, z: 5}, ..., {x: 4, ...}] type='3 * Vector3D[x:...'>

In [5]: object_vec + awkward_vec.like(object_vec)
Out[5]: <VectorArray2D [{x: 2, y: 0}, {...}, {x: 4, y: 4}] type='3 * Vector2D[x: fl...'>

In [6]: awkward_vec + awkward_vec.to_Vector2D()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-8908a0500dad> in <cell line: 0>()
----> 1 awkward_vec + awkward_vec.to_Vector2D()
...

In [7]: awkward_vec + awkward_vec.to_Vector2D().like(awkward_vec)
Out[7]: <VectorArray3D [{x: 2, y: -2, z: 5}, {...}, {...}] type='3 * Vector3D[x: fl...'>