Open nagisa opened 5 years ago
Current Codegen: https://godbolt.org/z/URP0xP
As Eli said, on trunk the codegen is branchless:
test: # @test
vucomisd %xmm1, %xmm0
movb $2, %cl
setae %al
subb %al, %cl
decb %al
vucomisd %xmm0, %xmm1
movzbl %al, %edx
movzbl %cl, %eax
cmovael %edx, %eax
retq
I was trying trunk, not 8.0. We generate a branchless sequence on trunk. If we convert the second select into a branch, maybe we can do something.
Replacing a ucomisd with a setcc+test doesn't seem profitable, unless ucomisd is really expensive on some CPUs.
I believe that the generated code could simply SETcc
most of the flags it needs into general purpose registers before actually doing flag-clobbering computations on them.
In this particular example, however, the fact that flag clobbering does not matter at all is easily visible from the generated assembly which looks like this:
test:
# %bb.0: # %start
ucomisd %xmm1, %xmm0
setae %al
ucomisd %xmm0, %xmm1
jae .LBB0_1
# %bb.2: # %start
movb $2, %cl
subb %al, %cl
movl %ecx, %eax
retq
.LBB0_1:
decb %al
retq
Here the 2nd ucomisd is trivially removable with the only other change that is necessary is inversion of the cc
in Jcc
instruction that follows.
I'm not sure how you expect the second UCOMISDrr to be removed; the SUB8rr and DEC8r clobber EFLAGS. I guess we could try to use LEA instead.
With latest we have better eflags handling, but now have 3 vucomisd ops:
test: # @test
xorl %eax, %eax
vucomisd %xmm1, %xmm0
adcb $1, %al
vucomisd %xmm1, %xmm0
movzbl %al, %eax
sbbl %ecx, %ecx
vucomisd %xmm0, %xmm1
cmovael %ecx, %eax
retq
Extended Description
Given code like this (keep in mind that this applies to most code examples where there are multiple
fcmp
instructions with the same arguments but differing comparison code):which returns -1, 0, 1, or 2 depending on whether the comparison between the two arguments is less-than, equal, greater-than or either-is-nan. On x86 this information is made available by a single execution of the
UCOMISD
instruction.However this IR ends up being built into something like:
and the duplicate
UCOMISDrr
end up never being removed.