Closed rnk closed 2 months ago
mentioned in issue llvm/llvm-bugzilla-archive#51100
mentioned in issue llvm/llvm-bugzilla-archive#30389
mentioned in issue llvm/llvm-bugzilla-archive#27183
mentioned in issue llvm/llvm-project#24719
mentioned in issue llvm/llvm-bugzilla-archive#22068
mentioned in issue llvm/llvm-project#21794
mentioned in issue llvm/llvm-bugzilla-archive#17907
Sure, it's easy to penalize access to overaligned locals, but I think they tend to be performance critical, so IMO it's better to penalize accesses to arguments. I assume the base pointer work was specifically done to address this issue.
If we're going to move the alignment gap so it isn't between the frame pointer and the stack pointer, we might as well just construct a single virtual register that pointers to the argument area, instead of using a separate virtual register for each argument.
I suspect adding an indirection to access overaligned locals is actually the easiest approach for most targets, at least ones that don't need overaligned spill slots. We already have LocalStackSlotAllocation, which does a similar transform.
To recap, there are three stack areas that the compiler may need to reference:
Overaligned variables make the offset from 1 to 2 dynamic, and allocas make the offset from 2 to 3 dynamic. LLVM's strategy today is to dedicate a physical register to all areas when needed.
Fixing this issue fully requires adding indirection to accesses to one of these areas. IMO, area #1, arguments, is the best option, especially given other LLVM design choices. To implement that, we should:
I looked at this briefly since a duplicate was filed, bug 51100.
Apparently, gcc doesn't use a base pointer at all. It uses different techniques on different targets; on x86, it emits an exotic prologue that actually stores the frame pointer below the realignment gap. On ARM, gcc apparently just never realigns the stack at all.
Bug llvm/llvm-bugzilla-archive#51100 has been marked as a duplicate of this bug.
Bug llvm/llvm-bugzilla-archive#30389 has been marked as a duplicate of this bug.
Bug llvm/llvm-bugzilla-archive#27183 has been marked as a duplicate of this bug.
Re-titling since this is generic across OSs and asm style.
I think we can safely use ebx on win32 - PIC is implemented w/o GOT there, so we can easily use ebx.
That's just a workaround, though. We will still conflict with inline asm using ebx on Windows (rare, due to MSVC's use of ebx) and esi on Linux (not uncommon due to string instructions).
I think we need to reformulate the problem, rather than trying to pick an arbitrary hardcoded register that works for all situations. Stack objects (allocas, spill slots) with low alignment can be accessed via ebp, and stack objects with large alignment requirements can be accessed via a virtual register, which can be spilled via ebp. Then we can let the register allocator solve the problem.
I think we can safely use ebx on win32 - PIC is implemented w/o GOT there, so we can easily use ebx.
Bug llvm/llvm-bugzilla-archive#22068 has been marked as a duplicate of this bug.
MSVC uses ebx instead of esi, for roughly the same reasons, although the roles are swapped. ebx points to the incoming arguments, and ebp points to the local variables. ebx is callee saved in most CCs, so it can be used across the body of the function.
MSVC emits a warning when modifying ebp, and ebx if it had to use it due to stack realignment:
int main() { declspec(align(8)) int a; asm { push ebp mov ebx, esp and esp, -16 mov a, edi mov esp, ebx pop ebp } return a; }
t.cpp(5) : warning C4731: 'main' : frame pointer register 'ebx' modified by inline assembly code t.cpp(9) : warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code
If you remove the need for stack realignment, only the ebp warning remains. You can modify ebp without warning, but only if you enable optimizations and the FP can be eliminated.
We could teach clang to emit a diagnostic like this.
Extended Description
So far as I can tell, the only register you can't touch in MS inline asm is ebp, but LLVM's x86 backend requires a BasePtr register which is separate from the frame pointer in ebp. It happens to hard code the choice of esi in X86RegisterInfo.cpp: // Use a callee-saved register as the base pointer. These registers must // not conflict with any ABI requirements. For example, in 32-bit mode PIC // requires GOT in the EBX register before function calls via PLT GOT pointer. BasePtr = Is64Bit ? X86::RBX : X86::ESI;
This will blow up if the inline asm clobbers esi.
Test case straight from LLVM's own lib/Support/Host.cpp:
bool GetX86CpuIDAndInfo(unsigned value, unsigned rEAX, unsigned rEBX, unsigned rECX, unsigned rEDX) { __asm { mov eax,value cpuid mov esi,rEAX mov dword ptr [esi],eax mov esi,rEBX mov dword ptr [esi],ebx mov esi,rECX mov dword ptr [esi],ecx mov esi,rEDX mov dword ptr [esi],edx } return false; }
This generates x86 asm like:
$ clang -cc1 -fms-compatibility t.cpp -o - -cxx-abi microsoft -S ... "?GetX86CpuIDAndInfo@@YA_NIPAI000@Z": pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %esp, %esi ...
APP