`deltaangle` between identical vectors return nan

simonthor commented 4 months ago

Vector Version

1.3.1

Python Version

3.11.4

OS / Environment

Kubuntu Linux 22.04

Describe the bug

When taking two identical vectors and computing the angle between them, I expect the result to be 0. Instead, I get NaN:

> v = vector.obj(x=1, y=1, z=1)
> v.deltaangle(v)
nan

This is also true when using vector.array. I have not tried other backends.

Any additional but relevant log output

No response

jpivarski commented 4 months ago

Guidance from ROOT: this should return zero, rather than nan.

ROOT's TVector3 has an Angle method that returns zero for the same vector:

>>> import ROOT
>>> a = ROOT.TVector3(1.1, 2.2, 3.3)
>>> a.Angle(a)
0.0

I wasn't able to find a generic "angle" method on XYZVector.

The issue is that in

https://github.com/scikit-hep/vector/blob/a9f1c7431a82289bac325d28d6fa4ab5ba4a8836/src/vector/_compute/spatial/deltaangle.py#L34-L37

the argument of arccos is supposed to be 1, but it's 1.0000000000000002. This is just an unfortunate round-off error: vector.obj(x=3, y=4, z=0) has a v.dot(v) / v.mag**2 of precisely 1.0, and vector.obj(x=3, y=4, z=5) has 0.9999999999999999, for instance.

I think a good solution would be to clamp the argument of arccos to be within -1 and 1, since $(\vec{x} \cdot \vec{y}) / (|\vec{x}| |\vec{y}|)$ is provably within this range (barring round-off error). That is, instead of

lib.arccos(ARG)

do

lib.arccos(lib.maximum(-1, lib.minimum(1, ARG)))

(with minimum and maximum being identity functions for SymPy).

Saransh-cpp commented 4 months ago

Thanks for the explanation! Just to confirm - should minimum and maximum be the identity functions or sympy.Min and sympy.Max aliases for the SymPy? (lib.maximum points to sympy.Max at the moment)

jpivarski commented 4 months ago

For SymPy, these functions should be the identity (no-op).

Maybe we need a clearer distinction among the functions on lib: they're used in two ways.

To actually perform the main calculation that we want.
To clean up corner cases and numerical error.

lib.nan_to_num and this particular use of lib.minimum/lib.maximum are for reason number 2. Mathematically, it's always true that

$$-1 \le \frac{\vec{x} \cdot \vec{y}}{|\vec{x}| |\vec{y}|} \le +1$$

but because floating point numbers are not exact, (as well as functions on them, like the addition and multiplication in $\cdot$), sometimes the computed value is about $10^{-16}$ outside of this range. That's enough to make a function like lib.arccos return NaN, but we really wanted lib.arccos(-1) (π) or lib.arccos(1) (0). Passing lib.arccos's argument through lib.minimum and lib.maximum adjusts for the $10^{-16}$ error.

But SymPy has no error because it's not numerical. Passing the argument through minimum/maximum would be superfluous at best, but it's worse than that because it complicates the mathematical expression in a way that would prevent simplification. (That is, unless SymPy is smart enough to recognize that the above inequality holds, and therefore minimum/maximum with ‒1 and 1 can be dropped from the expression, but I doubt SymPy is that smart. That's asking a lot from a CAS.)

It might happen at some later time that we want to use lib.minimum and lib.maximum in a mathematically important way—reason number 1, above—and then we'd need some way to distinguish between that case, which should use sympy.Min and sympy.Max, and the numerical clean-up case, which would treat this function as an identity. At that point, we'd have to extend the way we use lib to indicate that distinction. But we don't need it yet...

Saransh-cpp commented 4 months ago

Thanks for the detailed explanation! I've updated the definitions for lib.maximum and lib.minimum in the same PR.

scikit-hep / vector