Open Explorer09 opened 3 weeks ago
define i1 @src(i8 %x, i8 %y) zeroext {
entry:
%cmp.not = icmp eq i8 %y, 0
br i1 %cmp.not, label %land.end, label %land.rhs
land.rhs:
%mul = umul_overflow i8 %y, %x
%mul.ov = extractvalue {i8, i1} %mul, 1
br label %land.end
land.end:
%#0 = phi i1 [ 0, %entry ], [ %mul.ov, %land.rhs ]
ret i1 %#0
}
=>
define i1 @tgt(i8 %x, i8 %y) zeroext {
entry:
%fx = freeze i8 %x
%mul = umul_overflow i8 %y, %fx
%mul.ov = extractvalue {i8, i1} %mul, 1
ret i1 %mul.ov
}
Transformation seems to be correct!
Hi, I have spent some time with the issue.
I did some research and found this commit: https://github.com/llvm/llvm-project/commit/1977c53b2ae425541a0ef329ca10cc8d5cacd0cd#diff-2f2a992afc50868d7ba2b744cfacce4674821a51eca906a44bab2ff0b1b6dfd4
So there is already some logic that addresses a similar case to this one, but not with jumps and phis but with selects.
I think this logic could be more or less re-used. I could imagine an implementation in either in llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
or in llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
.
What do you think? @RKSimon
Also, may I work on this issue or anyone has a plan to work on it or even has already some progress?
Sounds like a plan - it might be interesting to first find out why the branches aren't becoming a select already.
I don't know anyone that is working on this right now - @dtcxzyw any thoughts?
SimplifyCFG will always (independent of cost modelling) flatten a single instruction, but here you have two, the umulo and the extract. So a possible alternative would be to adjust the heuristic in SimplifyCFG.
So a possible alternative would be to adjust the heuristic in SimplifyCFG.
We can treat extract oneuse(op.with.overflow),1
as a single instruction.
Expected result: All three functions produce the same code.
Actual result: func1() and func3() optimize to same code, but func2() had a redundant (y != 0) check that is not optimized out.
x86-64 clang with "-Os" option (tested in Compiler Explorer, a.k.a. godbolt.org)
(Note: I've also reported the bug in GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117529)