llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.12k stars 11.1k forks source link

Optimization: x86 "shl" condition codes can be reused #86788

Open Explorer09 opened 3 months ago

Explorer09 commented 3 months ago

This example codes shows a possible reuse of the SF (sign flag) after a left shift (<<) operation. But Clang seems to always generate a 'test' instruction after the shift; it doesn't reuse the condition code of the 'shl' when possible.

(Note: I also reported this issue in GCC.)

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>

bool my_isxdigit(unsigned char ch) {
  uint32_t mask1 = 0x7E00FFC0;
  // Prevent the compiler from transforming the constants.
  // Suppose we have to use them as written.
  __asm__ ("" : "+r" (mask1));

  if (!((mask1 << (ch & 0x1F)) >> 31))
    return false;

  uint32_t mask2 = 0x1A << 24;
  __asm__ ("" : "+r" (mask2));

  if (!((mask2 << (ch >> 4)) >> 31))
    return false;

  return true;
}

x86-64 Clang with "-O3" option (Actually I tested this in Compiler Explorer (godbolt.org))

my_isxdigit:
  movl   %edi, %ecx
  movl   $2113994688, %eax
  shll   %cl, %eax
  testl  %eax, %eax
  js     .LBB0_2
  xorl   %eax, %eax
  retq
.LBB0_2:
  movl   $436207616, %eax
  shrl   $4, %ecx
  shll   %cl, %eax
  shrl   $31, %eax
  retq

Possible smaller code (I show this tweak specifically for Clang; in the GCC bug report I wrote an even smaller one):

my_isxdigit_2:
  movl   %edi, %ecx
  movl   $2113994688, %edx
  xorl   %eax, %eax  # Sets SF to 0 if 'shll' does not affect flags.
  shll   %cl, %edx
  js     .L1
  retq
.L1:
  movl   $436207616, %eax
  shrl   $4, %ecx
  shll   %cl, %eax
  shrl   $31, %eax
  retq

The FLAGS register might be unmodified if the shift count is zero (i.e. value in %cl register is zero), but optimization is still possible if we set the FLAGS to a known state prior to the shift operation. (Thanks GCC developers for hinting this for me.) For example, xorl %eax, %eax sets CF and SF bits to 0, so the my_isxdigit_2 code can work.

This optimization might require tracking on whether the FLAGS states are known. If this is impossible internally in LLVM, feel free to close this report.

llvmbot commented 3 months ago

@llvm/issue-subscribers-backend-x86

Author: Kang-Che Sung (宋岡哲) (Explorer09)

This example codes shows a possible reuse of the SF (sign flag) after a left shift (<<) operation. But Clang seems to always generate a 'test' instruction after the shift; it doesn't reuse the condition code of the 'shl' when possible. (Note: [I also reported this issue in GCC](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114490).) ```c #include <stdbool.h> #include <stdint.h> #include <stdio.h> bool my_isxdigit(unsigned char ch) { uint32_t mask1 = 0x7E00FFC0; // Prevent the compiler from transforming the constants. // Suppose we have to use them as written. __asm__ ("" : "+r" (mask1)); if (!((mask1 << (ch & 0x1F)) >> 31)) return false; uint32_t mask2 = 0x1A << 24; __asm__ ("" : "+r" (mask2)); if (!((mask2 << (ch >> 4)) >> 31)) return false; return true; } ``` x86-64 Clang with "-O3" option (Actually I tested this in Compiler Explorer (godbolt.org)) ```x86asm my_isxdigit: movl %edi, %ecx movl $2113994688, %eax shll %cl, %eax testl %eax, %eax js .LBB0_2 xorl %eax, %eax retq .LBB0_2: movl $436207616, %eax shrl $4, %ecx shll %cl, %eax shrl $31, %eax retq ``` Possible smaller code (I show this tweak specifically for Clang; in the [GCC bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114490) I wrote an even smaller one): ```x86asm my_isxdigit_2: movl %edi, %ecx movl $2113994688, %edx xorl %eax, %eax # Sets SF to 0 if 'shll' does not affect flags. shll %cl, %edx js .L1 retq .L1: movl $436207616, %eax shrl $4, %ecx shll %cl, %eax shrl $31, %eax retq ``` The FLAGS register might be unmodified if the shift count is zero (i.e. value in `%cl` register is zero), but optimization is still possible if we set the FLAGS to a known state prior to the shift operation. (Thanks GCC developers for hinting this for me.) For example, `xorl %eax, %eax` sets CF and SF bits to 0, so the `my_isxdigit_2` code can work. This optimization might require tracking on whether the FLAGS states are known. If this is impossible internally in LLVM, feel free to close this report.