Unexpected behavior from std::fma when passing optimization flags

Rinzii commented 5 months ago

I've noticed a strange discrepancy with std::fma when handling the following edge case

If z is NaN, and x y is not 0 Inf or Inf * 0, then NaN is returned

When compiling with clang_trunk the following behavior is observed with the provided code below (consistent with both libstdc++ and libc++):

#include <iostream>
#include <limits>
#include <cmath>

double foo() {
    return std::fma(0.0, std::numeric_limits<double>::infinity(), 0.0);
}

int main()
{
    std::cout << foo();
}

When you pass to clang the flag -O0 this example returns -NaN but when we pass clang the flag -O3 this example returns +Nan. This behavior is suprising! Normally clang rarely changes its return values when compiler flags are passed. Especially for such a common flag as -O{N}. This behavior even changes when passing something as -Og and that does not seem correct.

When instead compiling for GCC we observe that GCC always returns -NaN no matter what optimization level is specified as one would expect. Where as with clang it changes its return value based on optimization flags and that in itself is unexpected!

The behavior can be observed here:

https://godbolt.org/z/K67j9K7Y4

Now even though this behavior is unexpected is it correct?

This is what IEEE-754 has to say on the behavior of NaN and its sign bit:

6.3 The sign bit When either an input or result is a NaN, this standard does not interpret the sign of a NaN. However, operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicates totalOrder and isSignMinus are also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.

(Excerpt from IEEE-754 (2019))

So as far as IEEE-754 is concerned LLVM is technically correct as the sign of NaN does not matter when returning in FMA. Now this may be fine for IEEE-754 but what does the LLVM docs say about the matter?

Return the same value as a corresponding libm ‘fma’ function but without trapping or setting errno. When specified with the fast-math-flag ‘afn’, the result may be approximated using a less accurate calculation.

https://llvm.org/docs/LangRef.html#llvm-fma-intrinsic

There is no where in the docs where this behavior is specified and the behavior does not neccessarily appear in line with what the documentation says. Where with GCC using GlibC the value returned is consistent no matter the optimization flags this is not the case with clang. Thus, I am to believe this is the incorrect result ignoring the fact that the result is unexpected. I am asking for clarification on if this is the intended result. I'd also like clarification on if LLVM thinks that this behavior is correct?

If this is correct then consider this bug report for the documentation of LLVM.

arsenm commented 5 months ago

There's no guarantee about the sign bit for a true floating point operation. In the O0 case, it's emitting the actual call to the libm function (or uses the instruction with -mfma), which return the -0. In the -Oanything cases, everything is constant folded and gives the +0 nan. I have no idea why the libcall or instruction are returning with the sign bit set, but it shouldn't really matter. I don't know why we would want change the constant folding to return negative nans

It's an IEEE FMA and we should drop the verbiage that refers to whatever libm happens to do.

Rinzii commented 5 months ago

There's no guarantee about the sign bit for a true floating point operation. In the O0 case, it's emitting the actual call to the libm function (or uses the instruction with -mfma), which return the -0. In the -Oanything cases, everything is constant folded and gives the +0 nan. I have no idea why the libcall or instruction are returning with the sign bit set, but it shouldn't really matter. I don't know why we would want change the constant folding to return negative nans

It's an IEEE FMA and we should drop the verbiage that refers to whatever libm happens to do.

Honestly the sign bit being set in my opinion does not matter so long as its consistent across all optimization levels. What is confusing is the fact that such a common flag changes the sign to begin with. In my opinion I subscribe to the mentality of what ever sign is returned it should be the same across all compiler flags save a few that are special cases e.g. -ffast-math or -Ofast

Still I agree that dropping the libm verbiage would help a bit with confusion, but I'd still argue that this behavior is not correct.

I guess what I am getting at is we should instead be just returning a consistent value. If LLVM's constant folding wants to return +NaN then we should just return positive NaN. Not some value decided by a flag the user passes.

jcranmer-intel commented 5 months ago

I don't like that printf accentuates the sign bit of NaNs in the result (although it is required to do so by IEEE 754); the exact payload of a NaN--including the sign bit--is not generally considered preserved by optimizations. What's happening here is that LLVM is choosing a different preferred NaN payload from the platform's default preferred NaN payload.

arsenm commented 5 months ago

Still I agree that dropping the libm verbiage would help a bit with confusion, but I'd still argue that this behavior is not correct.

There isn't really a principled way to provide this type of guarantee. Ultimately we're building on top of IEEE implementations, so there's a cost to providing stronger guarantees. We would have to restriction optimizations, while still making assumptions about the underlying target implementations, or emit additional instructions to fix up edge case results. It breaks down further when you look at any context optimizations that might change incoming sign bits that might feed a nan in the general case

If you want exactly what libm does, you can always use -fno-builtin

Rinzii commented 5 months ago

I don't like that printf accentuates the sign bit of NaNs in the result (although it is required to do so by IEEE 754); the exact payload of a NaN--including the sign bit--is not generally considered preserved by optimizations. What's happening here is that LLVM is choosing a different preferred NaN payload from the platform's default preferred NaN payload.

This is true though I'm finding the this behavior is also observable with std::signbit which in itself is in my opinion problematic. Still even if the sign bit was not preserved I'd still expect the result to be the same across optimization levels. Still it does appear this is intended behavior. So my issue instead is now that we should be upfront in the documentation that the sign bit of the returned NaN is indeterminate and can be any possible value or at least something documenting this behavior.

Rinzii commented 5 months ago

Still I agree that dropping the libm verbiage would help a bit with confusion, but I'd still argue that this behavior is not correct.

There isn't really a principled way to provide this type of guarantee. Ultimately we're building on top of IEEE implementations, so there's a cost to providing stronger guarantees. We would have to restriction optimizations, while still making assumptions about the underlying target implementations, or emit additional instructions to fix up edge case results. It breaks down further when you look at any context optimizations that might change incoming sign bits that might feed a nan in the general case

If you want exactly what libm does, you can always use -fno-builtin

I agree that there is manners to work around this issue and I get the stance you are coming from. Then I'd say my issue should evolve to instead mean that the documentation of LLVM should be upfront with this behavior and explain it. Then in the future we won't have a case like this where clarification is required.

jcranmer-intel commented 5 months ago

It's explicit in the LangRef (https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values):

For floating-point math operations, unless specified otherwise, the following rules apply when a NaN value is returned: the result has a non-deterministic sign [...]

IEEE-754 itself is pretty explicit even if it's not freely available:

The following value-changing transformations, among others, preserve the literal meaning of the source code:

Changing the payload or sign bit of a quiet NaN.

Rinzii commented 5 months ago

It's explicit in the LangRef (https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values):

For floating-point math operations, unless specified otherwise, the following rules apply when a NaN value is returned: the result has a non-deterministic sign [...]

IEEE-754 itself is pretty explicit even if it's not freely available:

The following value-changing transformations, among others, preserve the literal meaning of the source code:

Changing the payload or sign bit of a quiet NaN.

Then I think having a ref link to this on the specific function could help as this appears to be the only function I've seen this specific case happen. It appears all other cmath functions do not have this issue. Still thank you for these clarifications. I greatly appreciate it.

arsenm commented 5 months ago

Then I think having a ref link to this on the specific function could help as this appears to be the only function I've seen this specific case happen.

This is probably a consequence of being one of the few functions we correctly implement constant folding for, directly in the compiler. The other cases call out to host libm implementations which is a source of host dependent output and longstanding bug

arsenm commented 5 months ago

Created https://github.com/llvm/llvm-project/pull/92729 to avoid referring to libm

llvm / llvm-project

Unexpected behavior from std::fma when passing optimization flags #92592