known_identity has wrong values with -ffast-math

rafbiels commented 5 months ago

Describe the bug

The initial value for sycl::minimum<float> is set to inf which gets turned into 0 with -ffast-math. More generally, both min and max are affected for any type which has_infinity.

The issue is here: https://github.com/intel/llvm/blob/7271d613156f2268d538f20d92ecd52b1fbc488f/sycl/include/sycl/known_identity.hpp#L262-L271

std::numeric_limits<T>::infinity() should not be used here when -fno-honor-infinities is enabled (e.g. by -ffast-math). This causes the value to be evaluated as zero when used (at least in some cases) and subsequently leads to incorrect results of a program. Note that std::numeric_limits<float>::has_infinity evaluates to true regardless of the honor-infinities option.

This started failing in the nightly-2024-02-16 tag. The code in question hasn't changed in months, but it seems that the behaviour of -ffast-math has changed, possibly due to https://github.com/intel/llvm/commit/73159a994abcbf82881ee15b0df5cf13c9671f31

So it seems the code has been problematic earlier, but it was hidden by the -fno-honor-infinities not being set in this context.

I think there could be three solutions:

Just use std::numeric_limits<T>::max() in all cases. What's the benefit of using infinity here?
Add a second condition checking whether -fno-honor-infinities is used.
Make sure std::numeric_limits<T>::has_infinity evaluates to false when -fno-honor-infinities is enabled.

To reproduce

This is a minimal example of sycl::joint_reduce computing the minimum for a vector of 4 positive floats:

#include <sycl/sycl.hpp>
#include <cstdio>

int main() {
    sycl::queue q{};
    std::vector<float> data{4.0f, 1.0f, 3.0f, 2.0f};
    float result{data[0]};
    {
        sycl::buffer<float> inBuf{data};
        sycl::buffer<float> outBuf{&result, 1};
        q.submit([&inBuf, &outBuf](sycl::handler& cgh){
            sycl::accessor inAcc{inBuf, cgh, sycl::read_only};
            sycl::accessor outAcc{outBuf, cgh, sycl::write_only};
            cgh.parallel_for(sycl::nd_range<1>{1,1}, [inAcc, outAcc](sycl::nd_item<1> item){
                const float* start{&inAcc[0]};
                const float* end{start + inAcc.size()};
                outAcc[0] = sycl::joint_reduce(item.get_group(), start, end, sycl::minimum<>{});
            });
        });
    }
    printf("Result: %f\n", result);
    return 0;
}

When compiled with -O3 -ffast-math this prints 0.0 instead of the expected 1.0:

$ clang++ -fsycl -o test test.cpp -O3 -ffast-math
./test
Result: 0.000000

It can be worked around with -fhonor-infinities:

$ clang++ -fsycl -o test test.cpp -O3 -ffast-math -fhonor-infinities
./test
Result: 1.000000

Note this example doesn't directly use sycl::known_identity. It presents that the issue breaks a high-level feature. I didn't manage to create a small enough reproducer directly using the value, as it tends to get inlined / evaluated at compile time, and therefore not showing the issue. I have seen the value directly evaluating to zero in a larger project though.

Environment

Linux, Ubuntu 22.04
any target (tested Level Zero and CUDA)
GNU libstdc++6 v12.3.0
DPC++ tag nightly-2024-02-16 (first failing) through 2024-05-15 (latest tested)

Additional context

No response

npmiller commented 5 months ago

cc @andykaylor any thoughts on this?

Your patch linked is NFC so I don't really see how it would have changed the behavior, but maybe you have ideas on the best way to fix this, or why it only started failing recently.

rafbiels commented 5 months ago

There was a discussion of this in the original PR: https://github.com/llvm/llvm-project/pull/81173#discussion_r1483561988

Regardless of whether it was this PR or another in the upstream pull-down which changed the behaviour, it looks to me like the change was correct, i.e. the current behaviour matches the documentation: -ffast-math enables -fno-honor-infinities. It looks like it wasn't the case before. However, the known_identity code has been wrong before and the pull-down only exposed this. It's the known_identity implementation that needs to be fixed.

The known_identity bug in older versions could be exposed by setting -fno-honor-infinities explicitly in the compilation command, but it was so niche that no one observed it. Now that it's seen with -ffast-math, it's much more severe.

aelovikov-intel commented 5 months ago

I'm afraid https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.identities mandates that infinity must be used by the implementation. @gmlueck , WDYT?

rafbiels commented 5 months ago

Does -ffast-math conform to the SYCL standard otherwise?

BTW, this is (rightly) triggering the -Wnan-infinity-disabled warning which is hidden by including SYCL headers as system headers. Including the headers as non-system shows this.

test.cpp:

#include <sycl/known_identity.hpp>
float foo() {return sycl::known_identity<sycl::minimum<float>,float>::value;}

command:

$ clang++ -c test.cpp -fno-honor-infinities -I$(dirname $(which clang++))/../include -I$(dirname $(which clang++))/../include/sycl

sycl/known_identity.hpp:269:13: warning: use of infinity is undefined behavior due to the currently enabled floating-point options [-Wnan-infinity-disabled]
  269 |           ? std::numeric_limits<AccumulatorT>::infinity()
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test.cpp:2:71: note: in instantiation of static data member 'sycl::detail::known_identity_impl<sycl::minimum<float>, float>::value' requested here
    2 | float foo() {return sycl::known_identity<sycl::minimum<float>,float>::value;}
      |                                                                       ^

gmlueck commented 5 months ago

Just use std::numeric_limits<T>::max() in all cases. What's the benefit of using infinity here?

I don't think we should do this. Imagine a "min" reduction where all the values are INF (and the application is not compiled with -fno-honor-infinities). If the reduction used max as the identity, then I presume the result of the reduction would be max. But that would be wrong. Since all the values are INF, you'd expect the result to be INF.

Therefore, I think we should do one of the following:

Add a second condition checking whether -fno-honor-infinities is used.

Make sure std::numeric_limits<T>::has_infinity evaluates to false when -fno-honor-infinities is enabled.

I presume the effect on SYCL reductions is the same for either of these options. Obviously, changing the definition of std::numeric_limits<T>::has_infinity is a much larger scope, and I'd defer to @andykaylor to decide if this is the right thing to do.

If we do not decide to change the definition of has_infinity (or we think it's a very long discussion), we could change the SYCL headers to have an additional check for -fno-honor-infinities. We can always remove that check later if we decide to change the definition of has_infinity.

I think it is OK if the definition of SYCL's sycl::known_identity changes when the application is compiled with -fno-honor-infinities. Yes, this makes us non-conformant, but I think this is OK. When the user compiles with this option, they are specifically asking to trade-off correctness for performance.

There is a similar case for "max" reductions, where the known identity is -std::numeric_limits<T>::infinity(). Here, the identify should be lowest when compiling with -fno-honor-infinities. I did not see any other places in the SYCL spec that depend on std::numeric_limits<T>::infinity().

aelovikov-intel commented 5 months ago

I don't think we can even detect it...

$  diff -Naur <(clang++ -x c++ /dev/null -o - -E -dD -fhonor-infinities -DMY1) <(clang++ -x c++ /dev/null -o - -E -dD -fno-honor-infinites -DMY2)
--- /dev/fd/63  2024-06-07 15:34:16.516477537 -0700
+++ /dev/fd/62  2024-06-07 15:34:16.516477537 -0700
@@ -457,7 +457,7 @@
 #define __STDC_UTF_16__ 1
 #define __STDC_UTF_32__ 1
 # 1 "<command line>" 1
-#define MY1 1
+#define MY2 1
 #define __GCC_HAVE_DWARF2_CFI_ASM 1
 # 1 "<built-in>" 2
 # 1 "/dev/null" 2

gmlueck commented 5 months ago

I don't think we can even detect it...

I assume we could change the compiler to predefine a macro when -fno-honor-infinities is passed ...

aelovikov-intel commented 5 months ago

What if we use SYCL_EXTERNAL and multiple TUs with different fast math options? Wouldn't it be ODR-violation?

gmlueck commented 5 months ago

I agree that it's a bit weird for the sycl::known_identity trait to change based on the compiler flag. However, this seems less bad than returning a wrong value from a reduction. I'm open to other options if you have one.

aelovikov-intel commented 5 months ago

We can try using pragmas to disable FP optimizations in SYCL library code using known_identity. I'm not sure how to provide a way for the user to optimize it. sycl::reduction says that we can ignore user-provided identity if we know it ourselves for the given binary operation.

gmlueck commented 5 months ago

I suppose the SYCL RT could just ignore known_identity when we know it is INF and the TU is compiled with -fno-honor-infinities. Instead, the RT would silently use max instead in this case. That way, we will still get any optimizations associated with -fno-honor-infinities, and the reduction will still return the right answer (assuming the user's input does not contain any INF's).

intel / llvm