llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.1k stars 12.01k forks source link

x86 Poor codegen on 7 parameter thunk (missed tail call opportunity) #49482

Open TheThief opened 3 years ago

TheThief commented 3 years ago
Bugzilla Link 50138
Version 11.0
OS Linux
CC @topperc,@efriedma-quic,@RKSimon,@phoebewang,@TheThief,@zygoloid,@rotateright

Extended Description

While discussing clang's [[musttail]] attribute on Reddit, we discovered a case where clang/llvm doesn't produce a tail call and GCC and MSVC both do:

https://gcc.godbolt.org/z/doYGdG1dT

int bar(int, int, int, int, int, int, int);
int foo(int a, int b, int c, int d, int e, int f, int g) {
  return bar(a, b, c, d, e, f, g+7);
}

Clang output:

foo(int, int, int, int, int, int, int): # @​foo(int, int, int, int, int, int, int)
  push rax
  mov eax, dword ptr [rsp + 16]
  add eax, 7
  mov dword ptr [rsp], eax
  call bar(int, int, int, int, int, int, int)
  pop rcx
  ret

Additionally, it seems to oddly push rax at the start of the function, but restore it into rcx at the end. It seems likely that this saved register is why it's not performing the tail call, but it shouldn't be trying to preserve rax in the first place!

Using godbolt to try on older iterations of clang suggests that this codegen issue is very old indeed.

Quuxplusone commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#51000

efriedma-quic commented 3 years ago

The relevant code is X86TargetLowering::IsEligibleForTailCallOptimization , specifically the call to MatchingStackOffset. Not sure what the code is trying to do.

TheThief commented 3 years ago

Interestingly, if [[clang::musttail]] is added to the return statement, it generates the expected codegen:

foo(int, int, int, int, int, int, int): # @​foo(int, int, int, int, int, int, int)
  mov eax, dword ptr [rsp + 8]
  add eax, 7
  mov dword ptr [rsp + 8], eax
  jmp bar(int, int, int, int, int, int, int) # TAILCALL

This proves that clang/llvm is capable of generating the optimal code, it just fails to do so automatically for some reason.