Open GiudGiud opened 2 years ago
This is not a libMesh/MetaPhysicL exception. This is a floating point exception which is not an exception in the C++ sense; it is a signal. You cannot catch it. Perhaps you want to add a libMesh exception to TypeVector::norm
? And then if so, we can catch those and emit a mooseError
?
It's a good point.
I think it's a good idea. We have more and more AD users, and these errors are quite sneaky. Earlier today, we got the same with @tanoret with std::pow(some_ad, 1.5)
@roystgnr do you think we should put in exception throwing for operations like TypeVector::norm
if all coordinates are 0? This kind of thing is so tricky to make decisions on IMO since we would be performing LIBMESH_DIM new checks for every call to norm
. But that's probably premature optimization thinking without any profiling to back up the concern
What's the behavior of taking the norm of a non-AD RealVectorValue? Does it raise an exception or simply returns NaN? I'd prefer NaNs in this case, both for RealVectorValue and ADRealVectorValue.
Norm of a 0-vector is 0. The derivative is the only thing that is problematic
Sorry, I meant the derivative, which is NaN.
Reasoning: in tensor_mechanics we decide whether to compute the derivatives based on is_ad
, and therefore I would hope we could get consistent behavior. Since for a vector v
the hand-coded derivative of its norm w.r.t. itself is v/v.norm()
, I would hope the dual numbers of v.norm()
to be NaNs as well.
We enable the following floating point "exceptions" by default in dbg
mode:
void enableFPE(bool on)
{
#if !defined(LIBMESH_HAVE_FEENABLEEXCEPT) && defined(LIBMESH_HAVE_XMMINTRIN_H)
static int flags = 0;
#endif
if (on)
{
#ifdef LIBMESH_HAVE_FEENABLEEXCEPT
feenableexcept(FE_DIVBYZERO | FE_INVALID);
So both the AD and non-AD code (if it is actually doing v/v.norm()
) should raise floating point "exceptions" in dbg
mode and according to IEEE indeed should result in NaNs for other compilation modes
do you think we should put in exception throwing for operations like TypeVector::norm if all coordinates are 0?
Thinking about it, I can't avoid the conclusion that at some point we ought to have two different behavioral options here:
In one category of use cases, it's entirely reasonable at the low level to return (0, NaN) for the std::norm(0)
(value,derivatives). Calculations which then proceed to throw away the derivatives and only use the norm will work fine (albeit slower than they should, but that's what turning SIGFPE on and off helps us debug), and calculations which attempt to use the derivatives too will correctly propagate NaN rather than returning something that might be garbage. If I'm trying to use these derivatives to calculate a sensitivity solve or an adjoint solve, then the fact that I don't actually possess a two-side derivative and I can't calculate a precise sensitivity or adjoint solution is extremely important.
However, that's not what we do for e.g. std::abs(DualNumber)
- where we return f'=0 for x=0 - and in the second category of use cases, which IIRC is pretty much all MOOSE code right now, that is clearly the right behavior. If we're just getting a Jacobian to use for a primal solve, we don't actually care if it's an exact Jacobian, we just care that it's close enough to a linearization for us to get as good a quasi-Newton step as we can get, and returning something that's "in the middle of" the set of one-sided derivatives is about as good an approximation as we can get for that purpose, and returning NaN would be something in between an annoying mess (if a kernel has to manually test for NaN around any operation that might not have well-defined derivatives) and an utter disaster (if user code is stuck running with SIGFPE enabled and so doesn't get the chance to test for NaN).
MOOSE in dbg mode is the "utter disaster" case, isn't it? Because enabling SIGFPE and treating it as always an error is a great way to hunt bugs? So we do need to do something about that.
I don't see exception-throwing as the solution, though. If we're not going to catch that exception in user code then we might as well just let the SIGFPE kill the program. And if we are going to catch the exception in user code, we might as well just replace the norm(v)
call there with something along the lines of (raw_value(v) == 0 ? ADReal(0) : norm(v))
(with the typing fixed up, and encapsulated as regularized_norm(ADReal)
or something. That'll be faster for everybody if we only do those tests where we expect to need them rather than in every single norm()
call everywhere, and it should even be faster for the regularized_norm
users if we just invoke a test-and-jump rather than of the whole C++ exception-throwing machinery.
The only way I could see an exception as being preferable is if we have cases where norm()
is getting called at a low level, and the code calling it doesn't know whether norm(0)
should be regularized or should scream-and-die, but code further up the stack does know. ... which doesn't sound entirely implausible, now that I think about it. Do we have cases like that?
Isn't the limit if the derivative inifinity? Why not just use that instead of NaN?
norm((x,y,z)) in 3D is a cone surface in 4D ... it's easy to picture in 2D, with a 3D cone. If you look at the directional derivative at 0, d(norm)/d(direction) will always be 1, not infinity at all, but because there's no gradient vector which has a dot product of 1 with every possible direction there's no well-defined gradient. The "average" gradient over all direction vectors is 0, same as with the two one-sided derivatives of abs() (which is the same as norm() in 1D).
D'oh, yeah, never mind me. Tip of the cone is definitely not a well defined derivative.
The only way I could see an exception as being preferable is if we have cases where
norm()
is getting called at a low level, and the code calling it doesn't know whethernorm(0)
should be regularized or should scream-and-die, but code further up the stack does know. ... which doesn't sound entirely implausible, now that I think about it. Do we have cases like that?
I feel like most calls to norm
are in user code, or at least at the MOOSE framework level and above.
we might as well just replace the
norm(v)
call there with something along the lines of(raw_value(v) == 0 ? ADReal(0) : norm(v))
(with the typing fixed up, and encapsulated asregularized_norm(ADReal)
or something.
I like that
Bug description
Taking the norm of an ADVariableValue near 0 returns the message in the Discussion below. This is not ideal for user-facing output
Steps to reproduce
ADRealVectorValue(0).norm() should suffice to reproduce this.
Steps to fix
More floating point exceptions in moose to catch this Metaphysicl/libmesh exception
Impact
Better errors for users
Discussed in https://github.com/idaholab/moose/discussions/22072