llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.83k stars 11.46k forks source link

Fold ADD+CMP to CMP #37351

Open davidbolvansky opened 6 years ago

davidbolvansky commented 6 years ago
Bugzilla Link 38003
Version trunk
OS Linux
CC @topperc,@efriedma-quic,@LebedevRI,@RKSimon,@rotateright

Extended Description

Hello,

bool sw1(int x) { switch(x) { case 2: case 5: case 6: case 31: return true;

    default:
        return false;
}

}

Clang -03 sw1(int): add edi, -2 // why? cmp edi, 29 // fold add + cmp to cmp edi, 31 ja .LBB0_2 mov eax, 536870937 mov ecx, edi shr eax, cl and eax, 1 ret .LBB0_2: xor eax, eax ret

GCC 8.1 emits: sw1(int): xor eax, eax cmp edi, 31 ja .L1 mov eax, 1 mov ecx, edi sal rax, cl test eax, 2147483752 setne al .L1: ret

ICC 18 (more jumps, worse code?): sw1(int): cmp edi, 64 jae ..B1.4 mov rax, 0x080000068 bt rax, rdi jnc ..B1.4 mov eax, 1 ret ..B1.4: xor eax, eax ret

davidbolvansky commented 6 years ago

Should we modify ShouldBuildLookupTable and "return SI->getNumCases() 10 >= TableSize 4;", ie. pick better magic values?

Recently I hit ADD+CMP problem again, see code: https://godbolt.org/z/98lomT

efriedma-quic commented 6 years ago

This transform is done in SwitchToLookupTable in lib/Transforms/Utils/SimplifyCFG.cpp . In general, it tries to minimize the size of the "table" constant (in this case, by reducing it to 30 bits), but in certain cases, it would be better to use a larger table to avoid the extra add instruction. (By larger, I mean in terms of the encoded value; of course, you get a 4-byte immediate either way.)

icc apparently prefers not to transform the branch to straight-line code; you can get similar output from clang if you replace the "return true" with a function call.