[X86] LLVM >= 17 generates call to `__extendhfsf2` for bitcast -> and -> bitcast

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.03k stars 11.57k forks source link

[X86] LLVM >= 17 generates call to `__extendhfsf2` for bitcast -> and -> bitcast #104914

Closed overmighty closed 4 weeks ago

overmighty commented 4 weeks ago

https://godbolt.org/z/exb7PbjnK

C++ code:

_Float16 fabsf16(_Float16 x) {
    return __builtin_bit_cast(
        _Float16,
        static_cast<unsigned short>(
            __builtin_bit_cast(unsigned short, x) & 0x7fff
        )
    );
}

IR:

define dso_local noundef half @fabsf16(_Float16)(half noundef %x) local_unnamed_addr {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

Clang 16 output assembly with -O3:

fabsf16(_Float16):
        pextrw  eax, xmm0, 0
        and     eax, 32767
        pinsrw  xmm0, eax, 0
        ret

Clang 17 output assembly with -O3:

.LCPI0_0:
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
        .long   0x7fffffff
fabsf16(_Float16):
        push    rax
        call    __extendhfsf2@PLT
        andps   xmm0, xmmword ptr [rip + .LCPI0_0]
        call    __truncsfhf2@PLT
        pop     rax
        ret

cc @arsenm @lntue

llvmbot commented 4 weeks ago

@llvm/issue-subscribers-backend-x86

Author: OverMighty (overmighty)

https://godbolt.org/z/exb7PbjnK C++ code: ```cpp _Float16 fabsf16(_Float16 x) { return __builtin_bit_cast( _Float16, static_cast<unsigned short>( __builtin_bit_cast(unsigned short, x) & 0x7fff ) ); } ``` IR: ```llvm define dso_local noundef half @fabsf16(_Float16)(half noundef %x) local_unnamed_addr { entry: %0 = bitcast half %x to i16 %1 = and i16 %0, 32767 %2 = bitcast i16 %1 to half ret half %2 } ``` Clang 16 output assembly with `-O3`: ```asm fabsf16(_Float16): pextrw eax, xmm0, 0 and eax, 32767 pinsrw xmm0, eax, 0 ret ``` Clang 17 output assembly with `-O3`: ```asm .LCPI0_0: .long 0x7fffffff .long 0x7fffffff .long 0x7fffffff .long 0x7fffffff fabsf16(_Float16): push rax call __extendhfsf2@PLT andps xmm0, xmmword ptr [rip + .LCPI0_0] call __truncsfhf2@PLT pop rax ret ``` Related: https://github.com/llvm/llvm-project/pull/104869#issuecomment-2297563670. cc @arsenm @lntue

arsenm commented 4 weeks ago

Duplicate #104915, this version is just peeling out a layer of implementation detail

overmighty commented 4 weeks ago

There are cases where Clang/LLVM 17 does not do this but 18 does: https://godbolt.org/z/3as5jeG46. But maybe LLVM 17 just added a DAG pattern for x86 that's not as reliable as the InstCombine change in LLVM 18, and both issues are caused by FABS ending up being used and FABS for f16 promoting the operand to f32 when AVX-512 FP16 is unavailable.

llvm/lib/Target/X86/X86ISelLowering.cpp has:

  auto setF16Action = [&] (MVT VT, LegalizeAction Action) {
    setOperationAction(ISD::FABS, VT, Action);
    // ...
  };

  // ...

    // Half type will be promoted by default.
    setF16Action(MVT::f16, Promote);

arsenm commented 4 weeks ago

There are cases where Clang/LLVM 17 does not do this but 18 does:

The DAG always did this (only as an optimization), depending on the old target hook hasBitPreservingFPLogic, which was removed in 09dd4d870e192da73594b713bb201859e5a09efb. The underlying promotion for fabs was always broken, just hidden by the optimization