Missed optimization: (y != 0 && x > (unsigned)(-1) / y) (multiplication overflow check)

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

29.39k stars 12.15k forks source link

Missed optimization: (y != 0 && x > (unsigned)(-1) / y) (multiplication overflow check) #115683

Open Explorer09 opened 3 weeks ago

Explorer09 commented 3 weeks ago

#include <limits.h>
#include <stdbool.h>

bool func1(unsigned long long x, unsigned long long y)
{
    return x > (ULLONG_MAX / y);
}

bool func2(unsigned long long x, unsigned long long y)
{
    return y != 0 && x > (ULLONG_MAX / y);
}

bool func3(unsigned long long x, unsigned long long y)
{
    unsigned long long res;
    return __builtin_umulll_overflow(x, y, &res);
}

Expected result: All three functions produce the same code.

Actual result: func1() and func3() optimize to same code, but func2() had a redundant (y != 0) check that is not optimized out.

x86-64 clang with "-Os" option (tested in Compiler Explorer, a.k.a. godbolt.org)

func1:
        movq    %rsi, %rax
        mulq    %rdi
        seto    %al
        retq

func2:
        testq   %rsi, %rsi
        je      .LBB1_1
        movq    %rsi, %rax
        mulq    %rdi
        seto    %al
        retq
.LBB1_1:
        xorl    %eax, %eax
        retq

(Note: I've also reported the bug in GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117529)

RKSimon commented 2 weeks ago

define i1 @src(i8 %x, i8 %y) zeroext {
entry:
  %cmp.not = icmp eq i8 %y, 0
  br i1 %cmp.not, label %land.end, label %land.rhs

land.rhs:
  %mul = umul_overflow i8 %y, %x
  %mul.ov = extractvalue {i8, i1} %mul, 1
  br label %land.end

land.end:
  %#0 = phi i1 [ 0, %entry ], [ %mul.ov, %land.rhs ]
  ret i1 %#0
}
=>
define i1 @tgt(i8 %x, i8 %y) zeroext {
entry:
  %fx = freeze i8 %x
  %mul = umul_overflow i8 %y, %fx
  %mul.ov = extractvalue {i8, i1} %mul, 1
  ret i1 %mul.ov
}
Transformation seems to be correct!

spaits commented 1 week ago

Hi, I have spent some time with the issue. I did some research and found this commit: https://github.com/llvm/llvm-project/commit/1977c53b2ae425541a0ef329ca10cc8d5cacd0cd#diff-2f2a992afc50868d7ba2b744cfacce4674821a51eca906a44bab2ff0b1b6dfd4 So there is already some logic that addresses a similar case to this one, but not with jumps and phis but with selects. I think this logic could be more or less re-used. I could imagine an implementation in either in llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp or in llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp. What do you think? @RKSimon Also, may I work on this issue or anyone has a plan to work on it or even has already some progress?

RKSimon commented 5 days ago

Sounds like a plan - it might be interesting to first find out why the branches aren't becoming a select already.

I don't know anyone that is working on this right now - @dtcxzyw any thoughts?

nikic commented 5 days ago

SimplifyCFG will always (independent of cost modelling) flatten a single instruction, but here you have two, the umulo and the extract. So a possible alternative would be to adjust the heuristic in SimplifyCFG.

dtcxzyw commented 5 days ago

So a possible alternative would be to adjust the heuristic in SimplifyCFG.

We can treat extract oneuse(op.with.overflow),1 as a single instruction.