llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.22k stars 11.65k forks source link

[X86] Possible wrong compilation of scalar comparison into cmpnltpd #63561

Closed kronbichler closed 1 year ago

kronbichler commented 1 year ago

Hello,

I experienced a bug in code generation for the x86-64 target. For the minimal test case test.txt compiled on

$ clang++-16 -v
Ubuntu clang version 16.0.0 (1~exp5ubuntu3)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/13
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64

with

 clang++-16 -Og -S test.cc

I get the assembly code

f(double, double, unsigned int, bool):                               # @f(double, double, unsigned int, bool)
        test    esi, esi
        jne     .LBB0_5
        mov     eax, edi
        cvtsi2sd        xmm2, rax
        mulsd   xmm2, xmm1
        addsd   xmm1, xmm2
        movapd  xmm3, xmm0
        cmpnltpd        xmm3, xmm2
        cmpnltpd        xmm1, xmm0
        ....

see also https://godbolt.org/z/s3v3rdfsd Prior content in the register xmm2 from before entering this function in the upper lane can trigger a floating point exception in the second to last line for cmpnltpd xmm3, xmm2. Specifically, I see

(gdb) p $xmm2
$1 = {v2_double = {0.40000000000000002, nan(0xc000000000000)}}

showing that the upper lane contains an invalid entry. The generated code does not give the FPE with clang-15, nor does it with the optimization level -O0. Using -O2, -O3 also leads to the invalid code according to godbolt both for clang-15 and clang-16.

Please let me know if I should provide a main function to invoke this. All one needs to do is to set xmm2 to _mm_set1_pd(std::numeric_limits<float>::signaling_NaN()); and call feenableexcept(FE_DIVBYZERO | FE_INVALID); before calling f(0.2, 0.2, 2, false);. I could be wrong and something might be disallowed by my code, but I believe this is valid code and wrong within LLVM.

Note that the code is extracted from a big project, https://github.com/dealii/dealii/issues/15496#issuecomment-1609945214

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-x86

efriedma-quic commented 1 year ago

If you're using feenableexcept, you need to pass -ftrapping-math or equivalent.

kronbichler commented 1 year ago

Thank you for the info, I did not consider the flag.