clang does not properly optimize xor expressions

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Other

27.21k stars 11.14k forks source link


Bugzilla Link	36622
Version	5.0
OS	FreeBSD
CC	@RKSimon,@zygoloid,@rotateright

Extended Description

It appears that in some code sequences Clang fails to recognize that x xor C xor C == x where C is a compile-time constant.

For instance:

uint32_t combine(uint16_t x, uint16_t y) { uint32_t r = 0x5a5a7777;

    r ^= (r ^ x) & 0xffff;
    r ^= ((r >> 16) ^ y) << 16;

    return (r);

}

In this code the initial value of r is not important and the code should be equivalent to:

uint32_t combine(uint16_t x, uint16_t y) { return (x | ((uint32_t)y << 16)); }

But even with -O3 Clang produces the following assembler code:

    orl     $1515847680, %edi       # imm = 0x5A5A0000
    xorl    $23130, %esi            # imm = 0x5A5A
    shll    $16, %esi
    xorl    %edi, %esi
    movl    %esi, %eax

It's easy to see that 0x5A5A will always cancel out through double application via xor, but Clang does not recognize that.

Maybe this can be fixed in reassociate or instcombine.

There are two zext getting in the way:

define dso_local i32 @combine(unsigned short, unsigned short)(i16 zeroext, i16 zeroext) local_unnamed_addr #0 { %3 = zext i16 %0 to i32 %4 = or i32 %3, 1515847680 %5 = xor i16 %1, 23130 %6 = zext i16 %5 to i32 %7 = shl nuw i32 %6, 16 %8 = xor i32 %7, %4 ret i32 %8 }

If you change the function argument to be:

uint32_t combine(uint32_t x, uint32_t y)

it produces optimal code, I think:

define dso_local i32 @combine(unsigned int, unsigned int)(i32, i32) local_unnamed_addr #0 { %3 = shl i32 %1, 16 %4 = and i32 %0, 65535 %5 = or i32 %4, %3 ret i32 %5 }

or equivalently, in asm:

combine(unsigned int, unsigned int): # @combine(unsigned int, unsigned int) shl esi, 16 movzx eax, di or eax, esi ret

llvm / llvm-project