Optimization: x86 "shl" condition codes can be reused

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Other

27.12k stars 11.1k forks source link

(Note: I also reported this issue in GCC.)

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>

bool my_isxdigit(unsigned char ch) {
  uint32_t mask1 = 0x7E00FFC0;
  // Prevent the compiler from transforming the constants.
  // Suppose we have to use them as written.
  __asm__ ("" : "+r" (mask1));

  if (!((mask1 << (ch & 0x1F)) >> 31))
    return false;

  uint32_t mask2 = 0x1A << 24;
  __asm__ ("" : "+r" (mask2));

  if (!((mask2 << (ch >> 4)) >> 31))
    return false;

  return true;
}

x86-64 Clang with "-O3" option (Actually I tested this in Compiler Explorer (godbolt.org))

my_isxdigit:
  movl   %edi, %ecx
  movl   $2113994688, %eax
  shll   %cl, %eax
  testl  %eax, %eax
  js     .LBB0_2
  xorl   %eax, %eax
  retq
.LBB0_2:
  movl   $436207616, %eax
  shrl   $4, %ecx
  shll   %cl, %eax
  shrl   $31, %eax
  retq

Possible smaller code (I show this tweak specifically for Clang; in the GCC bug report I wrote an even smaller one):

my_isxdigit_2:
  movl   %edi, %ecx
  movl   $2113994688, %edx
  xorl   %eax, %eax  # Sets SF to 0 if 'shll' does not affect flags.
  shll   %cl, %edx
  js     .L1
  retq
.L1:
  movl   $436207616, %eax
  shrl   $4, %ecx
  shll   %cl, %eax
  shrl   $31, %eax
  retq

The FLAGS register might be unmodified if the shift count is zero (i.e. value in %cl register is zero), but optimization is still possible if we set the FLAGS to a known state prior to the shift operation. (Thanks GCC developers for hinting this for me.) For example, xorl %eax, %eax sets CF and SF bits to 0, so the my_isxdigit_2 code can work.

This optimization might require tracking on whether the FLAGS states are known. If this is impossible internally in LLVM, feel free to close this report.

@llvm/issue-subscribers-backend-x86

Author: Kang-Che Sung (宋岡哲) (Explorer09)

This example codes shows a possible reuse of the SF (sign flag) after a left shift (<<) operation. But Clang seems to always generate a 'test' instruction after the shift; it doesn't reuse the condition code of the 'shl' when possible. (Note: [I also reported this issue in GCC](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114490).) ```c #include <stdbool.h> #include <stdint.h> #include <stdio.h> bool my_isxdigit(unsigned char ch) { uint32_t mask1 = 0x7E00FFC0; // Prevent the compiler from transforming the constants. // Suppose we have to use them as written. __asm__ ("" : "+r" (mask1)); if (!((mask1 << (ch & 0x1F)) >> 31)) return false; uint32_t mask2 = 0x1A << 24; __asm__ ("" : "+r" (mask2)); if (!((mask2 << (ch >> 4)) >> 31)) return false; return true; } ``` x86-64 Clang with "-O3" option (Actually I tested this in Compiler Explorer (godbolt.org)) ```x86asm my_isxdigit: movl %edi, %ecx movl $2113994688, %eax shll %cl, %eax testl %eax, %eax js .LBB0_2 xorl %eax, %eax retq .LBB0_2: movl $436207616, %eax shrl $4, %ecx shll %cl, %eax shrl $31, %eax retq ``` Possible smaller code (I show this tweak specifically for Clang; in the [GCC bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114490) I wrote an even smaller one): ```x86asm my_isxdigit_2: movl %edi, %ecx movl $2113994688, %edx xorl %eax, %eax # Sets SF to 0 if 'shll' does not affect flags. shll %cl, %edx js .L1 retq .L1: movl $436207616, %eax shrl $4, %ecx shll %cl, %eax shrl $31, %eax retq ``` The FLAGS register might be unmodified if the shift count is zero (i.e. value in `%cl` register is zero), but optimization is still possible if we set the FLAGS to a known state prior to the shift operation. (Thanks GCC developers for hinting this for me.) For example, `xorl %eax, %eax` sets CF and SF bits to 0, so the `my_isxdigit_2` code can work. This optimization might require tracking on whether the FLAGS states are known. If this is impossible internally in LLVM, feel free to close this report.

llvm / llvm-project

Optimization: x86 "shl" condition codes can be reused #86788