[clang] `-O2` requires `-ftrapping-math` or `-ffast-math` on amd64 to avoid false positives with `feenableexcept(FE_DIVBYZERO)`

illwieckz commented 1 day ago

Hi, using this sample code:

#include <cfenv>
#include <cstdio>

struct bc_t
{
    float minDist, maxDist;
    float hudSize;
    float a1, a2, a3 = 0.0f;
    float minSize;
    float b1, b2, b3 = 0.0f;
    float maxSize;
    float c1, c2, c3 = 0.0f;
};

int main(int, char**)
{
    feenableexcept(FE_DIVBYZERO);
    bc_t bc;
    sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.minSize, &bc.maxSize);
    bc.minDist = bc.hudSize / bc.maxSize;
    bc.maxDist = bc.hudSize / bc.minSize;
    printf("%f\n", bc.minDist);
    return 0;
}

By compiling it with -O2, the compiled code raises some division by zero exception when executed:

Process 3247973 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
    frame #0: 0x00005555555551c5 a.out`main + 101
a.out`main:
->  0x5555555551c5 <+101>: divps  %xmm2, %xmm0
    0x5555555551c8 <+104>: movlps %xmm0, (%rsp)
    0x5555555551cc <+108>: cvtss2sd %xmm0, %xmm0
    0x5555555551d0 <+112>: leaq   0xe3c(%rip), %rdi

Process 3249378 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
    frame #0: 0x00005555555551c5 a.out`main((null)=<unavailable>, (null)=<unavailable>) at a.cpp:20:26
   17       feenableexcept(FE_DIVBYZERO);
   18       bc_t bc;
   19       sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.maxSize, &bc.minSize);
-> 20       bc.minDist = bc.hudSize / bc.maxSize;
   21       bc.maxDist = bc.hudSize / bc.minSize;
   22       printf("%f\n", bc.minDist);
   23       return 0;

My guess is that the SSE optimized code raises division by zero exceptions from unused fields of the xmm registers, despite we discard those results.

The requirement of -O2 requiring either -ftrapping-math or -ffast-math is curious since -ffast-math sets -fno-trapping-math.

Also, I thought -ftrapping-math was the default, and that -fno-trapping-math is part of -ffast-math, but just using -O2 behaves like if -fno-trapping-math is used, so it behaves like if part of -ffast-math was enabled anyway… But maybe -ffast-math just modifies other behaviors that make -ftrapping-math or -fno-trapping-math irrelevant.

Using the Godbolt's compiler explorer (here Clang 19.1.0), it only works with -O2 -ffast-math (or depreacted -Ofast):

compiler flags	status
✅️
`-O0`	✅️
`-Os`	❌️
`-Os -fno-trapping-math`	❌️
`-Os -ftrapping-math`	✅️
`-O1`	✅️
`-O2`	❌️
`-O2 -fno-trapping-math`	❌️
`-O2 -ftrapping-math`	✅️
`-ffast-math`	✅️
`-O2 -ffast-math`	✅️
`-O2 -ffast-math -ftrapping-math`	✅️
`-O2 -ffast-math -fno-trapping-math`	✅️
`-Ofast`	✅️

See: https://godbolt.org/z/395c8nMef And (with -ftrapping-math added): https://godbolt.org/z/czbGTjs6E And (with -fno-trapping-math added): https://godbolt.org/z/zYr5Pba3W

With 32-bit i686 I reproduce the bug when using SSE but get no bug when not using SSE.

compiler flags	`-m32 -msse`	`-m32 -mno-sse`
✅️	✅️
`-Os`	❌️	✅️
`-O1`	✅️	✅️
`-O2`	❌️	✅️
`-ffast-math`	✅️	✅️
`-O2 -ffast-math`	✅️	✅️

See 32-bit i686 with SSE I get the same failure: https://godbolt.org/z/199eqhzGh And 32-bit i686 build without SSE I get no failure: https://godbolt.org/z/sfd9TsTj4

On my end on Ubuntu 24.04 with amd64, I get same results with clang 19.1.5.

On Ubuntu 24.04 with amd64 and different versions of the clang compiler, I only get it working with clang 13 and 14, every later version breaks it:

`-O0`	`-O1`	`-O2`	`-O2 -ftrapping-math`	`-ffast-math`	`-O2 -ffast-math`
clang 13.0.1	✅️	✅️	✅️	✅️	✅️	✅️
clang 14.0.6	✅️	✅️	✅️	✅️	✅️	✅️
clang 15.0.7	✅️	✅️	❌️	✅️	✅️	✅️
clang 16.0.6	✅️	✅️	❌️	✅️	✅️	✅️
clang 17.0.6	✅️	✅️	❌️	✅️	✅️	✅️
clang 18.1.3	✅️	✅️	❌️	✅️	✅️	✅️
clang 19.1.5	✅️	✅️	❌️	✅️	✅️	✅️

On GCC 14.02 I get none of those issues (no one false positive division by zero error is raised whatever the compiler flags being used: https://godbolt.org/z/rxMWM68Mc

Note: Disabling SSE on amd64 just produces garbage computation (1.0/1.0 gives 0.0), but I don't know if that makes sense to disable SSE on amd64. GCC produces the same garbage (1.0/1.0 giving 0.0). Though I'm surprised to not get a warning if that's not legit to do, also I'm surprised the generated code runs if that's not legit to do. See: https://godbolt.org/z/77PYGTc5b (Clang) and: https://godbolt.org/z/cvW4Er3Kd (GCC).

illwieckz commented 1 day ago

@andykaylor here:

https://github.com/llvm/llvm-project/issues/118152#issuecomment-2510637777

you said:

In clang, we do apply fast-math to the intriniscs. There are some problems with that. […]

Does clang also applies some implicit optimizations to generated intrinsics as well (like -fno-trapping-math), optimizations that would not be used when not generating intrinsics?

By generated intrinsics I mean clang deciding to use divss on some xmm register when doing a single division on standalone floats like the user writing a = b / c; without the user using any explicit function from immintrin.h.

illwieckz commented 1 day ago

This is something I found when tracking down #118152 on my end by fiddling with compiler flags:

https://github.com/llvm/llvm-project/issues/118152

I was hoping that using feenableexcept() would help me catch some float computation errors that Clang 19 produces but Clang 18 doesn't, in hope to pinpoint some sample of code that reproduce #118152.

But I found this instead.

illwieckz commented 1 day ago

So @slipher pointed out to me the usage of -fno-trapping-math is the default, and indeed I found this:

-f[no-]trapping-math

Control floating point exception behavior. -fno-trapping-math allows optimizations that assume that floating point operations cannot generate traps such as divide-by-zero, overflow and underflow.

The option -ftrapping-math behaves identically to -ffp-exception-behavior=strict.

The option -fno-trapping-math behaves identically to -ffp-exception-behavior=ignore. This is the default.

The quote is from: https://clang.llvm.org/docs/UsersManual.html

It means the default behavior makes feenableexcept() unusable on architectures featuring SSE.

But since it's a glibc function, maybe clang cannot do anything about it…

arsenm commented 19 hours ago

This code doesn't have FENV_ACCESS on for most of the flags you demonstrated, meaning feenableexcept works or exceptions are observed. We REALLY need to have a warning if any of these functions are used without it enabled

andykaylor commented 18 hours ago

@andykaylor here:

[clang] -ffast-math in 19.1.0 prevents function from returning intended __m128 bitmask #118152 (comment)

you said:

In clang, we do apply fast-math to the intriniscs. There are some problems with that. […]

Does clang also applies some implicit optimizations to generated intrinsics as well (like -fno-trapping-math), optimizations that would not be used when not generating intrinsics?

By generated intrinsics I mean clang deciding to use divss on some xmm register when doing a single division on standalone floats like the user writing a = b / c; without the user using any explicit function from immintrin.h.

In that comment I was referring to the target-specific intrinsics declared in immintrin.h and the related headers that it includes. The use of divss is something else. That's just an implementation detail determined by the backend.

The problem you're running into here is that by default clang does not allow access to the floating-point environment. Unless you have doene something to enable access to the floating-point environment (such as #pragma FENV_ACCESS ON or compiled with -ffp-model=strict or -ftrapping-math) the optimizer will assume the default floating-point environment, meaning floating-point exceptions are unmasked and the default rounding mode is used.

This isn't really specific to SSE. The same assumptions are made for any target, but it may be exposed in different ways on different targets.

jcranmer-intel commented 16 hours ago

It means the default behavior makes feenableexcept() unusable on architectures featuring SSE.

But since it's a glibc function, maybe clang cannot do anything about it…

You need to use #pragma STDC FENV_ACCESS ON (or equivalent command-line flags) to use the function. Even in standard C, being in a non-default floating-point environment without setting that pragma is undefined behavior, and all compilers that I'm aware of retain that sense of you need to opt in to observing the floating-point environment.

illwieckz commented 2 hours ago

This code doesn't have FENV_ACCESS on for most of the flags you demonstrated, meaning feenableexcept works or exceptions are observed. We REALLY need to have a warning if any of these functions are used without it enabled

Oh! That's good to know! I have seen this pragma used in some macOS example but most Linux/BSD documentation seem to not mention it (example, example, example).

A warning would help a lot indeed.

I guess I got all the answers I needed.

llvm / llvm-project

[clang] `-O2` requires `-ftrapping-math` or `-ffast-math` on amd64 to avoid false positives with `feenableexcept(FE_DIVBYZERO)` #118265