Open chfast opened 2 years ago
@llvm/issue-subscribers-backend-x86
Regressed since llvm 10.
Workaround:
void workaround(int g)
{
unsigned h;
bool o = __builtin_usub_overflow(g, 3, &h);
if (o) [[unlikely]]
return;
work(h);
}
As @davidbolvansky noticed it is a regression since llvm10. I bisected it to commit https://github.com/llvm/llvm-project/commit/0f22e783a038b6983f0fe161eef6cf2add3a4156
Apparently this change has some mystery.
As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed.
However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner
add
will not go away - that will still unchain the comparison fromadd
, it will no longer need to wait foradd
to compute.
This change combines cmp
+ add
even if add
has multiple users.
; Function Attrs: mustprogress uwtable
define dso_local void @_Z4testi(i32 noundef %0) local_unnamed_addr #0 {
%2 = add nsw i32 %0, 3
%3 = icmp slt i32 %2, 0
%4 = zext i1 %3 to i64
br i1 %3, label %6, label %5, !prof !6
5: ; preds = %1
call void @_Z4worki(i32 noundef %2)
br label %6
6: ; preds = %1, %5
ret void
}
*** IR Dump After InstCombinePass on _Z4testi ***
; Function Attrs: mustprogress uwtable
define dso_local void @_Z4testi(i32 noundef %0) local_unnamed_addr #0 {
%2 = icmp slt i32 %0, -3
br i1 %2, label %5, label %3, !prof !6
3: ; preds = %1
%4 = add nsw i32 %0, 3
call void @_Z4worki(i32 noundef %4)
br label %5
5: ; preds = %1, %3
ret void
}
Then icmp
and add
are lowered into separate cmp edi, 3
and add edi, -3
as it is shown in the subject. It seems to me that codegen is too late to fix it – the original add
has already been sunk into if
block and we need to do some hoisting. That's why I decided to address to the original patch.
I think it works good for the cases as shown below. We have a comparison with a value and there is no much difference with which value we compare 100
or 92
.
%tmp4.7 = add nuw nsw i64 %tmp3, 8
%tmp6.7 = icmp ult i64 %tmp4.7, 100
In case of comparison with non-zero, we impose extra arithmetic. However, many targets have an ability to compare effectively with zero after math operations. I propose to limit the original patch to comparisons with non-zero values. D144764.
It changes X86
as follows (even better than GCC):
# Before D144764
test(int): # @test(int)
cmp edi, -3
jl .LBB0_1
add edi, 3
jmp work(int)@PLT # TAILCALL
.LBB0_1:
ret
# After D144764
test(int): # @test(int)
addl $-3, %edi
jns work(int)@PLT # TAILCALL
retq
For AArch64
we need to change g -= 3
-> g += 3
to see the effect.
// Before D144764
test(int): // @test(int)
cmn w0, #3
b.lt .LBB0_2
add w0, w0, #3
b work(int)
.LBB0_2:
ret
// After D144764
test(int): // @test(int)
adds w0, w0, #3
b.mi .LBB0_2
b _Z4worki
.LBB0_2:
ret
@LebedevRI what do you think?
There already is a backend undo fold for this at https://github.com/llvm/llvm-project/blob/7910ed1d56c349b76c82d5ebe2f2590770955ff5/llvm/lib/CodeGen/CodeGenPrepare.cpp#L7956
For add
, it's currently limited to equality comparisons though, so it doesn't handle this precise pattern yet.
@LebedevRI any update here? I can try to address in CodeGenPrepare as pointed by @nikic but there is also a fix draft by @e-kud in D144764.
@LebedevRI any update here? I can try to address in CodeGenPrepare as pointed by @nikic but there is also a fix draft by @e-kud in D144764.
If the backend undo fold can recover this, we'll want to do that. If not, we can start considering InstCombine limitations.
Sorry, I haven't thought that there is still the interest to it.
I've made the change suggested by @nikic. It is here https://reviews.llvm.org/D145454. It works fine but it revealed a problem (my previous approach has the same problem, but it is hidden from tests).
Consider the following code.
void decref(long* refcount) {
long count = *refcount;
if (count == 1) { free_object() }
else if (count > 1) { *refcount = count - 1; }
else { /* immortal */ }
}
Here we can make only one comparison of count
against 1
and use it for both branches. This optimization is implemented.
When we move to the usage of flags from add
or sub
we have the following situation:
decl %eax
je .LBB0_3
# %bb.1: # %bb2
testl %eax, %eax
jle .LBB0_4
And generally we can't eliminate test
because of decl
because test
resets CF
flag in contrast to add
. The bummer here is that test
appeared from CMP, 0
, but what if it isn't?
I tried to use hasNoSignedWrap
predicate but it is not propagated to CodeGen
.
This is where I stuck, any ideas?
produces
but the better would be (as done by GCC)
https://godbolt.org/z/MjWe91cxc