scikit-hep / vector

Vector classes and utilities
https://vector.readthedocs.io
BSD 3-Clause "New" or "Revised" License
78 stars 25 forks source link

Numpy errors are supressed by default #177

Open CaspianChaharom opened 2 years ago

CaspianChaharom commented 2 years ago

Some coordinate systems can not represent some vectors. For example, if adding two vectors in pt,phi,eta,E coordinates that have opposite azimuthal components that add to zero, the resulting longitudinal coordinate eta will be infinity:

import numpy as np
import vector
v1 = vector.obj(pt=1, phi=0,     eta=1, E=1)
v2 = vector.obj(pt=1, phi=np.pi, eta=1, E=1)
new_vector =  v1 + v2  # this vector can not be represented in the pt,phi,eta,E coordinate system

By default, such errors are suppressed in the library https://github.com/scikit-hep/vector/blob/69d0b1d04f268540b2e553a927a8b51a7edb7633/src/vector/_compute/lorentz/add.py#L201 Without that line, the default Numpy error would be shown

/usr/lib/python3.10/site-packages/vector/_compute/spatial/eta.py:43: RuntimeWarning: divide by zero encountered in arctanh
  return lib.nan_to_num(lib.arctanh(z / lib.sqrt(rho ** 2 + z ** 2)), nan=0.0)
vector.obj(pt=1.2246467991473532e-16, phi=1.5707963267948966, eta=1.7976931348623157e+308, E=2)

I think it would be better if the user is made aware of the error by default (rather than suppressing by default). Additionally, that leaves the choice of suppressing errors or not to the user, by adding numpy.errstate(all="ignore") in their code.

jpivarski commented 2 years ago

At least I can say that the reason for this setting is that many of the formulas would raise warnings all the time, due to intermediate states of the calculation, even though the final results are finite (not NaN or Inf). You can see in this one that nan_to_num is used to turn NaN into zero (to follow ROOT's behavior).

The default should probably be to suppress the warnings, but it would be nice to give users control over it. How should that be, a global setting or a context manager? The setting would have to get into every compute submodule.

CaspianChaharom commented 2 years ago

Ok, I see. I'm not sure how important it is, but the reason I suggested showing the warnings by default may have been specific to my use case. I was adding many vectors from a file with an eta azimuthal component, and getting the mass of the summed vector. It turned out a few of them had zero longitudinal components, leading to infinite masses that were used later on in the script. I ended up having to go through looking at many intermediate results to track down that the issue was an infinite eta, so it would have been nice to have the warning shown when that line was executed.

Both ideas seem ok:

[^1]: Though the RuntimeWarning: divide by zero encountered in arctanh is shown only the first time and not again if encountered again

jpivarski commented 2 years ago

A third possibility would be to put an isnan check after ignoring all the errors in the intermediate calculations. We'd have to make sure this "plays well" with all the intended backends, and it would be another level of warning that needs to be suppressable.

Also, we'd have to decide whether to call out infinity as being as bad as not a number. I think infinity is more benign than not a number: it appears in cases that—if there were any small errors—would be large numbers instead of infinity, and they do the right things in <, > operations.

Though the RuntimeWarning: divide by zero encountered in arctanh is shown only the first time and not again if encountered again

This is true for all the NumPy warnings, just because that's how warnings work in Python. (For that reason, I don't see them as being very helpful. They tell you that something bad happened, but by not stopping the program and but roasting themselves, they don't tell you where or how many times the bad thing happened. And then, if the bad thing is in an intermediate step and expected, they become useless noise.

On the other hand, these warnings can also be turned into errors, which are useful if they can be suppressed when they're expected.

CaspianChaharom commented 2 years ago

I think there should at least be an option to have infinity shown as a warning. But I do see the utility you mentioned of not marking infinity as a bad number. Are you thinking of a global verbosity config? (silent, show all except infinity, show all)

jpivarski commented 2 years ago

I guess it should have the same or a similar interface to np.errstate, some vector.errstate.