Open Kmeakin opened 2 months ago
@llvm/issue-subscribers-backend-aarch64
Author: Karl Meakin (Kmeakin)
I am less familiar with x86, but I think the x86 assembly could also be optimized:
contains_checked:
mov ecx, esi
cmp cl, 32
setb dl
mov eax, 1
shl eax, cl
test eax, edi
setne al
and al, dl
ret
to
contains_checked:
bt edi, esi
setb al
cmp sil, 32
setb dl
and al, dl
ret
@llvm/issue-subscribers-backend-x86
Author: Karl Meakin (Kmeakin)
With AArch64 you could even fold the 2nd and
and cmp
together:
lsr w8, w0, w1
and w0, w8, #0x1
tst w9, w1, #0xef
csel w0, w0, wzr, ne
ret
CC @RKSimon
define i1 @contains_unchecked(i32 %bits, i8 %index) {
%conv = zext nneg i8 %index to i32
%shl = shl nuw i32 1, %conv
%and = and i32 %shl, %bits
%cmp = icmp ne i32 %and, 0
ret i1 %cmp
}
define i1 @contains_checked(i32 %bits, i8 %index) {
%cmp = icmp ult i8 %index, 32
%conv = zext nneg i8 %index to i32
%shl = shl nuw i32 1, %conv
%and = and i32 %shl, %bits
%cmp2 = icmp ne i32 %and, 0
%retval.0 = select i1 %cmp, i1 %cmp2, i1 false
ret i1 %retval.0
}
But in this case, LLVM is smart enough to rewrite
(bits & (1 << index)) != 0
to(bits >> index) != 0
and so save 2 instructions
Is there a typo is the original post? I think the expression should be (bits >> index) & 1
and not (bits >> index) != 0
.
But in this case, LLVM is smart enough to rewrite
(bits & (1 << index)) != 0
to(bits >> index) != 0
and so save 2 instructionsIs there a typo is the original post? I think the expression should be
(bits >> index) & 1
and not(bits >> index) != 0
.
Yes, that was a typo. Thanks
https://godbolt.org/z/h54KPcvKh
For
contains_unchecked
, LLVM generates optimal assembly. The naive implementation would beBut in this case, LLVM is smart enough to rewrite
(bits & (1 << index)) != 0
to((bits >> index) & 1) != 0
and so save 2 instructions:However, LLVM fails to make the same optimization when bounds checking is added to avoid UB:
I believe the optimal assembly would be: