llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.42k stars 11.74k forks source link

llvm.smul.with.overflow.i64 lowered to non-existent __multi3 on PowerPC 32-bit #54460

Closed aaronpuchert closed 2 years ago

aaronpuchert commented 2 years ago

LLVM 14 built with Clang 14 doesn't link on PowerPC (32-bit) because of

/usr/bin/ld: LoopStrengthReduce.cpp:(.text._ZN12_GLOBAL__N_111LSRInstance24GenerateAllReuseFormulaeEv+0xf1c): undefined reference to `__multi3'

The relevant lines are

    int64_t NewBaseOffset = (uint64_t)Base.BaseOffset * Factor;
    assert(Factor != 0 && "Zero factor not expected!");
    if (NewBaseOffset / Factor != Base.BaseOffset)
      continue;

We can build a small reproducer for that:

bool mul_overflow(unsigned long long a, long long b) {
    long long prod = a * b;
    return (prod / b != a);
}

Clang 13 compiles this to (with -target powerpc-unknown-linux-gnu -O2):

define dso_local zeroext i1 @_Z12mul_overflowyx(i64 %0, i64 %1) local_unnamed_addr #0 {
  %3 = mul i64 %1, %0
  %4 = sdiv i64 %3, %1
  %5 = icmp ne i64 %4, %0
  ret i1 %5
}

So basically what it says. But Clang 14 recognizes that this is basically an overflow check and that we can get rid of the superfluous division:

define dso_local noundef zeroext i1 @_Z12mul_overflowyx(i64 noundef %0, i64 noundef %1) local_unnamed_addr #1 {
  %3 = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %1, i64 %0)
  %4 = extractvalue { i64, i1 } %3, 1
  ret i1 %4
}

Then CodeGen produces this:

_Z12mul_overflowyx:                     # @_Z12mul_overflowyx
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:
        mflr 0
        stw 0, 4(1)
        stwu 1, -16(1)
        .cfi_def_cfa_offset 16
        .cfi_offset lr, 4
        mr      9, 3
        srawi 3, 5, 31
        srawi 7, 9, 31
        mr      10, 4
        mr      4, 3
        mr      8, 7
        bl __multi3
        srawi 5, 5, 31
        xor 3, 3, 5
        xor 4, 4, 5
        or 3, 4, 3
        cntlzw  3, 3
        not     3, 3
        rlwinm 3, 3, 27, 31, 31
        lwz 0, 20(1)
        addi 1, 1, 16
        mtlr 0
        blr
.Lfunc_end0:
        .size   _Z12mul_overflowyx, .Lfunc_end0-.Lfunc_begin0
        .cfi_endproc

But there is no __multi3 on 32-bit PowerPC, because it doesn't have a 128-bit integer type. It's not in libgcc_s.so and compiler-rt also shouldn't have it: compiler-rt/lib/builtins/int_types.h has

#if defined(__LP64__) || defined(__wasm__) || defined(__mips64) ||             \
    defined(__riscv) || defined(_WIN64)
#define CRT_HAS_128BIT
#endif

and then compiler-rt/lib/builtins/multi3.c is guarded with that.

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-powerpc

topperc commented 2 years ago

@nickdesaulniers @LebedevRI another broken target.

aaronpuchert commented 2 years ago

My guess is that because ISD::SMULO is not handled in PPCDAGToDAGISel::Select, it falls back to some generic code that expands the result to 128 bits where we can easily see whether there was an overflow. But I'm not really familiar with the backends.

If I understand this correctly, PowerPC has overflow flags that should allow directly lowering the builtin. As an alternative, we might teach the middle end to stop being so clever, although it would be a (minor) loss.

aaronpuchert commented 2 years ago

@tstellar, feel free to move this to 14.0.1 if the release is otherwise fine, I can probably patch around this for now.

topperc commented 2 years ago

This has been recurring issue. Here are other patches for other targets

https://reviews.llvm.org/D112750 https://reviews.llvm.org/D108928 https://reviews.llvm.org/D108939 https://reviews.llvm.org/D108936 https://reviews.llvm.org/D108844 https://reviews.llvm.org/D108842

topperc commented 2 years ago

It used to be that llvm.smul.with.overflow would only be generated for __builtin_smul_overflow. For types that aren't legal it would use a libcall. That libcall only exists in compiler-rt and not libgcc, and it doesn't exist for all types.

In https://reviews.llvm.org/D107581 I taught the backend how to expand illegal types when the libcall was marked unavailable by the target. After that, InstCombine was changed to start using the intrinsic, but no thorough audit of the backends to disable the libcall was done. This has resulted in one target at a time being patched to fix this ever since.

I've considered going into InstCombine and adding a check for DataLayout::fitsInLegalInteger to block the transform that uses smul.with.overflow. The code we're emitting after D107581 is far from perfect.

aaronpuchert commented 2 years ago

Thanks for the insight! It seems that https://reviews.llvm.org/D108936 already added

    setLibcallName(RTLIB::MULO_I64, nullptr);

for 32-bit PPC. So that's good. Perhaps it should also block RTLIB::MUL_I128?

aaronpuchert commented 2 years ago

That seems to have worked out, I posted https://reviews.llvm.org/D122090. After that we get

_Z12mul_overflowyx:                     # @_Z12mul_overflowyx
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:                                # %entry
        mulhwu 8, 6, 4
        li 7, 0
        mullw 11, 5, 4
        addc 8, 11, 8
        mulhwu 9, 5, 4
        addze 9, 9
        mullw 12, 6, 3
        addc 8, 12, 8
        mulhwu 10, 6, 3
        addze 10, 10
        addc 9, 9, 10
        addze 7, 7
        mullw 0, 5, 3
        addc 9, 0, 9
        mulhwu 11, 5, 3
        adde 7, 11, 7
        srawi 10, 5, 31
        srawi 12, 3, 31
        mulhwu 11, 4, 10
        mulhwu 0, 12, 6
        mullw 4, 4, 10
        mullw 5, 12, 5
        add 5, 0, 5
        mullw 3, 3, 10
        add 10, 11, 4
        add 3, 10, 3
        mullw 6, 12, 6
        addc 4, 6, 4
        add 5, 5, 6
        adde 3, 5, 3
        addc 4, 9, 4
        adde 3, 7, 3
        srawi 5, 8, 31
        xor 3, 3, 5
        xor 4, 4, 5
        or 3, 4, 3
        cntlzw  3, 3
        not     3, 3
        rlwinm 3, 3, 27, 31, 31
        blr
.Lfunc_end0:
        .size   _Z12mul_overflowyx, .Lfunc_end0-.Lfunc_begin0
        .cfi_endproc
aaronpuchert commented 2 years ago

/branch aaronpuchert/llvm-project/bug54460

llvmbot commented 2 years ago

/pull-request llvmbot/llvm-project#129

tstellar commented 2 years ago

@topperc What do you think about backporting this? https://github.com/llvm/llvm-project/commit/c1a31ee65b3a2bf2b452febb99703b3fdfa43328

topperc commented 2 years ago

@topperc What do you think about backporting this? c1a31ee

Looks ok for backport.

tstellar commented 2 years ago

Merged: add3ab7f4c8a