Open daniel-zabawa opened 3 weeks ago
@llvm/issue-subscribers-backend-x86
Author: Daniel Zabawa (daniel-zabawa)
Thanks for the test case - I thought we'd already handled most of these
define i32 @PR113965(i32 noundef %x) {
entry:
%cmp = icmp slt i32 %x, 2
br i1 %cmp, label %return, label %if.end
if.end:
%conv = zext nneg i32 %x to i64
br label %do.body
do.body:
%l.0 = phi i64 [ 1, %if.end ], [ %l.0.shr, %do.body ]
%u.0 = phi i64 [ %conv, %if.end ], [ %shr.u.0, %do.body ]
%add = add nsw i64 %u.0, %l.0
%shr = ashr i64 %add, 1
%mul = mul nsw i64 %shr, %shr
%cmp2 = icmp samesign ugt i64 %mul, %conv
%l.0.shr = select i1 %cmp2, i64 %l.0, i64 %shr
%shr.u.0 = select i1 %cmp2, i64 %shr, i64 %u.0
%add5 = add nsw i64 %l.0.shr, 1
%cmp6 = icmp slt i64 %add5, %shr.u.0
br i1 %cmp6, label %do.body, label %do.end
do.end:
%conv7 = trunc i64 %l.0.shr to i32
br label %return
return:
%retval.0 = phi i32 [ %conv7, %do.end ], [ %x, %entry ]
ret i32 %retval.0
}
The CMOVBE/CMOVNBE instructions generate 2 uops and have a throughput of 1 for P-cores. Other CMOVs are a single uop with a throughput of 2.
The following case shows the backend generating the more expensive CMOVBE/CMOVA instructions:
Compiling the above with trunk as
clang -O2 -march=core-avx2 -S f.c
generates:The
cmovge
andcmovl
instructions should be preferred to these where possible.