llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.64k stars 11.84k forks source link

clang does not set DAZ flag in -ffast-math mode #14396

Closed tobiasgrosser closed 2 years ago

tobiasgrosser commented 12 years ago
Bugzilla Link 14024
Version trunk
OS Linux
Attachments Test case
CC @andykaylor,@d0k,@gnzlbg,@arsenm

Extended Description

$gcc jacobi_1d.DenormalsAreZero.c -O3 $time ./a.out real 0m20.164s

$gcc jacobi_1d.DenormalsAreZero.c -O3 -ffast-math $time ./a.out real 0m0.357s

$clang jacobi_1d.DenormalsAreZero.c -O3 $time ./a.out real 0m36.660s

$clang jacobi_1d.DenormalsAreZero.c -O3 -ffast-math $time ./a.out real 0m36.431s

As can be seen the gcc produced binary is a lot faster than clang in -ffast-mode (besides being a little bit faster in general). This is not caused by better optimizations, but because gcc links in a small function into the resulting binary, which sets the DAZ register.

From [1]:

"DAZ tells the CPU to force all Denormals to zero. A Denormal is a number that is so small that FPU can't renormalize it due to limited exponent ranges. They're just like normal numbers, but they take considerably longer to process. Note that not all processors support DAZ."

The test case happens to calculate a lot of these close-to-zero values. Hence, setting the register has a big impact.

[1] http://softpixel.com/~cwright/programming/simd/sse.php

arsenm commented 4 years ago

I believe this should set the right ftz/daz flags since fa7cd549d604bfd8f9dce5d649a19720cbc39cca

llvmbot commented 4 years ago

Depends on the target, but crtfastmath.o should set both DAZ and FTZ. E.g. x86-64.

NoSignedZeros is orthogonal to DAZ and FTZ. Except maybe for AMDGPU (I think, not sure) and other targets that have special flushing modes.

Also, Andy Kaylor just suggested an LLVM specific way to set these flags at runtime through compiler_rt.

54aefcd4-c07d-4252-8441-723563c8826f commented 4 years ago

Should FTZ also be set, e.g., -fno-signed-zero is enabled (e.g. via -ffast-math) ?

llvmbot commented 9 years ago

The pmmintrin.h header for SSE3 (included in clang) has the macro _MM_SET_DENORMALS_ZERO_MODE that sets DAZ. It doesn't require any SSE3-functionality to do so, only _mm_getcsr and _mm_setcsr, which are part of basic SSE. This might be an alternative when crtfastmath is not available (e.g. on Mac OS X).

d0k commented 12 years ago

r165240 makes clang link crtfastmath.o if it's available (only on linux for now).

d0k commented 12 years ago

This is a neat trick. GCC links in crtfastmath.o (part of libgcc) which sets the necessary bits. We should do the same in the clang driver and provide crtfastmath.o with compiler-rt.

jyknight commented 2 years ago

This was resolved by r165240 (aka 058666a8d02f5cd348150862a3401c9c4bd0b4d0) back in 2012, not sure why it wasn't closed then.