llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.02k stars 11.96k forks source link

[X86] LLVM >= 18 folding bitcast -> and -> bitcast to `@llvm.fabs.f16` generates call to `__extendhfsf2` #104915

Open overmighty opened 2 months ago

overmighty commented 2 months ago

https://godbolt.org/z/on6Pj6Tsc

Starting from https://github.com/llvm/llvm-project/commit/5c0da5839de1bdc08f411a04305a9bdadf538ad5, InstCombine transforms this IR:

define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat {
entry:
  %0 = bitcast half %x to i16
  %1 = and i16 %0, 32767
  %2 = bitcast i16 %1 to half
  ret half %2
}

into:

define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat {
entry:
  %0 = call half @llvm.fabs.f16(half %x)
  ret half %0
}

On x86, when AVX-512 FP16 is not available, this generates something like:

.LCPI0_0:
        .long   0x7fffffff
__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        call    __extendhfsf2@PLT
        vbroadcastss    xmm1, dword ptr [rip + .LCPI0_0]
        vandps  xmm0, xmm0, xmm1
        call    __truncsfhf2@PLT
        pop     rbp
        ret

whereas with LLVM 16, bitcast -> and -> bitcast generates something like:

__llvm_libc_20_0_0_git::fabsf16(_Float16):
        push    rbp
        mov     rbp, rsp
        vpextrw eax, xmm0, 0
        and     eax, 32767
        vpinsrw xmm0, xmm0, eax, 0
        pop     rbp
        ret

Related:

cc @arsenm @lntue

arsenm commented 2 months ago

Forming fabs is correct, the codegen is not. Promotion should bitcast to integer

llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-x86

Author: OverMighty (overmighty)

https://godbolt.org/z/on6Pj6Tsc Starting from https://github.com/llvm/llvm-project/commit/5c0da5839de1bdc08f411a04305a9bdadf538ad5, InstCombine transforms this IR: ```llvm define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat { entry: %0 = bitcast half %x to i16 %1 = and i16 %0, 32767 %2 = bitcast i16 %1 to half ret half %2 } ``` into: ```llvm define linkonce_odr hidden noundef half @_Float16 __llvm_libc_20_0_0_git::fputil::abs<_Float16, 0>(_Float16)(half noundef %x) local_unnamed_addr #1 comdat { entry: %0 = call half @llvm.fabs.f16(half %x) ret half %0 } ``` On x86, when AVX-512 FP16 is not available, this generates something like: ```asm .LCPI0_0: .long 0x7fffffff __llvm_libc_20_0_0_git::fabsf16(_Float16): push rbp mov rbp, rsp call __extendhfsf2@PLT vbroadcastss xmm1, dword ptr [rip + .LCPI0_0] vandps xmm0, xmm0, xmm1 call __truncsfhf2@PLT pop rbp ret ``` whereas with LLVM 16, `bitcast` -> `and` -> `bitcast` generates something like: ```asm __llvm_libc_20_0_0_git::fabsf16(_Float16): push rbp mov rbp, rsp vpextrw eax, xmm0, 0 and eax, 32767 vpinsrw xmm0, xmm0, eax, 0 pop rbp ret ``` Related: - https://github.com/llvm/llvm-project/issues/104914 - https://github.com/llvm/llvm-project/pull/104869#issuecomment-2297563670 cc @arsenm @lntue
arsenm commented 2 months ago

The same bug exists for fneg and copysign, and this is repeated in the type legalizer and promotion action

arsenm commented 2 months ago

Attaching partial patch. It's incomplete because it asserts on targets with legal f16, but no legal i16. I don't have the patience to fight with SelectionDAG's insistence that there can be no illegal types after type legalization

0001-DAG-Do-not-use-FP-conversions-to-promote-fabs.patch

beetrees commented 4 weeks ago

This isn't x86 specific; it affects FABS and FNEG for any target that uses TypeSoftPromoteHalf. The issue is that the lowering uses the generic SoftPromoteHalfRes_UnaryOp method. The resulting lowering is actually a miscompilation as __extendhfsf2 and __truncsfhf2 quieten signalling NaNs, whereas FABS and FNEG are defined as only affecting the sign bit (FCOPYSIGN doesn't have this issue as its lowering in SoftPromoteHalfRes_FCOPYSIGN doesn't convert to and from a f32).