llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.32k stars 12.11k forks source link

__builtin_va_arg stack-walk short #28037

Open llvmbot opened 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 27663
Version 3.8
OS MacOS X
Reporter LLVM Bugzilla Contributor

Extended Description

I've marked this as clang 3.8, but code was compiled on Apple's Xcode 7.3.1. Sample code ================ va_sum.c int va_sum(unsigned int count, ...) { int sum = 0; __builtin_ms_va_list ap;

__builtin_ms_va_start(ap, count);
while (count) {
    sum += __builtin_va_arg(ap, int);
    --count;
}
__builtin_ms_va_end(ap);
return sum;

}

Compiled with clang -S -Os -fno-unwind-tables -target x86_64-pc-win32-macho va_sum.c

generates this code ================ va_sum.s .section TEXT,text,regular,pure_instructions .globl _va_sum _va_sum: ## @​va_sum

BB#0:

pushq   %rax
movq    %r9, 40(%rsp)
movq    %r8, 32(%rsp)
movq    %rdx, 24(%rsp)
leaq    24(%rsp), %rax
movq    %rax, (%rsp)
xorl    %eax, %eax
testl   %ecx, %ecx
je  LBB0_2

LBB0_1: ## %.lr.ph

=>This Inner Loop Header: Depth=1

movq    (%rsp), %r8
addq    $3, %r8
andq    $-4, %r8
leaq    4(%r8), %rdx
movq    %rdx, (%rsp)
addl    (%r8), %eax
decl    %ecx
jne LBB0_1

LBB0_2: ## %._crit_edge popq %rdx retq

.subsections_via_symbols

The stack walk is wrong (leaq, 4(%r8), %rdx). It advances the pointer by 4 bytes each time - the sizeof(int). On x86_64 the stack needs to be walked by steps of 8 bytes.

llvmbot commented 8 years ago

results of testing the patch The patch works.

I've never built clang from sources. So I downloaded 3.8 released sources from http://llvm.org/releases/download.html. [The svn was kind of slow and I was worried about running into other problems trying to compile from trunk]

I relativized the patch to 3.8 release in Targets.cpp. Had to comment out the Opts parameter to the constructor because it doesn't exist there. Otherwise seems ok.

I tested with and without the patch. The files in the attachment va_sum.c - test case with builtin_ms_va_list va_sum2.c - test case with builtin_va_list

Compiled always with "-S -Os -fno-unwind-tables -target x86_64-pc-win32-macho"

va_sum.vanilla.s - compile with regular 3.8 of va_sum.c va_sum.patched.s - compile with patched 3.8 of va_sum.c va_sum2.vanilla.s - compile with regular 3.8 of va_sum2.c va_sum2.patched.s - compile with patched 3.8 of va_sum2.c

Results look good. In va_sum2.patched.s it's noticeable that the stack frame is shorter because the ms_va_list is now just 8 bytes, and not 24 bytes like it was taking before (boxed in a sysv_va_list). va_sum.patched.s and va_sum2.patched.s are identical, so builtin and builtin_ms are being handled the same. The stack-walk is 8.

I didn't actually try test/CodeGen/windows-macho.c, because I don't know how to run it. Just tested on my own stuff.

Thanks a lot. If there's something else you want me to test let me know.

llvmbot commented 8 years ago

Patch to use char* va_list in EFI Mach-O environments OK. That's useful to know.

Does this patch help?

llvmbot commented 8 years ago

-target x86_64-pc-win32-macho is used in EDK2 for building EFI binaries. The use of macho object format is so that interim object files can be handled with Apple's libtool and ld. After a final EFI binary is linked in macho format, it is converted to coff using an Apple tool called mtoc.

There are 2 implementations of va_list for this target... __builtin_va_list The va_list is a 24-byte struct like a sysv_va_list. It is passed by reference when used as an argument (i.e. a pointer to it is passed.) It is copied as a struct with builtin_va_copy. However, it's not a sysv_va_list, because instead of storing the usual data, builtin_va_start puts the pointer (ms_va_list) into offset zero of the struct. The iterator builtin_va_arg handles this correctly. The generated code seems fine insofar as STDC is concerned, though it's not Msft-compatible. From your explanation of the way clang handles it, it seems to be a complete coincidence that this works... Sema::BuildVAArgExpr classifies builtin_va_list as sysv_va_list based on the Darwin TargetInfo. It is later passed to WinX86_64ABIInfo::EmitVAArg, which is "the wrong" emitter for it. I think maybe this is not a coincidence, but some intended design.

The other implementation is __builtin_ms_va_list which appeared to work until clang 3.8, and which is the subject of this bug report, as the iterator __builtin_va_arg is incorrect. From your description, this form va_list is not intended to be used for this target, and in fact is rejected in clang trunk due to a source change.

The sizeof(long) for this target is 8, and I don't want to change anything not related to va_list which may break existing code.

I think the solution for this target is that.... builtin_va_list should work just like it does in Windows64 COFF target and identical to Microsoft's implementation. In other words simple pointer, passed-by-value. builtin_ms_va_list can optionally be supported as a synonym for __builtin_va_list, or generate a diagnostic like it does today - indicating it's not meant for this target.

llvmbot commented 8 years ago

tl;dr: One part of Clang thinks it's targeting Darwin, another part thinks it's targeting Windows, and the whole thing blows up as a result. And I don't know if it's safe to change it, because I don't know what aspects of the win32-macho compilation environment you're depending on (e.g. sizeof(long), whether or not _WIN32 and _WIN64 get defined, etc.)

llvmbot commented 8 years ago

I think I've figured out what's really going on.

In Clang Basic, we have logic to detect the target in use and provide an appropriate TargetInfo object. If the triple identifies a target that uses Mach-O (as yours does), even if it's a Windows target, it always uses Darwin's TargetInfo (cf. lib/Basic/Targets.cpp:8341-2). I could easily change it not to do that when the OS is a Windows OS, but the problem is that I don't know if you're depending or not on the Basic TargetInfo being Darwin-esque, because there are no tests for this (i.e. there are no tests for type sizes and pre-defined symbols for a win32-macho target).

In any case, for this reason, this va_arg() gets identified as a special, Microsoftish call (cf. lib/Sema/SemaExpr.cpp:12199-200). So, CodeGenFunction::EmitVAArg() tries to emit it as such (cf. lib/CodeGen/CGCall.cpp:4118-26). But the Windows IRGen TargetInfo doesn't implement EmitMSVAArg(), on the assumption that it will never see this kind of specially marked va_arg(). Result: ScalarExprEmitter::EmitVAArgExpr() gets back an invalid address and (as of r261717) bails (previously it would fall back to emitting an LLVM IR va_arg instruction, which kinda worked but not quite, as you've seen).

llvmbot commented 8 years ago

Update: If I modify the C code to use builtin_va_list instead of builtin_ms_va_list (also start and end), and compile with -target x86_64-pc-win32-macho, it generates the following code

======== .section TEXT,text,regular,pure_instructions .globl _va_sum _va_sum: ## @​va_sum

BB#0:

subq    $24, %rsp
movq    %r9, 56(%rsp)
movq    %r8, 48(%rsp)
movq    %rdx, 40(%rsp)
leaq    40(%rsp), %rax
movq    %rax, (%rsp)
xorl    %eax, %eax
testl   %ecx, %ecx
je  LBB0_3

BB#1: ## %.lr.ph

movq    (%rsp), %rdx
addq    $8, %rdx
xorl    %eax, %eax

LBB0_2: ## =>This Inner Loop Header: Depth=1 movq %rdx, (%rsp) addl -8(%rdx), %eax addq $8, %rdx decl %ecx jne LBB0_2 LBB0_3: ## %._crit_edge addq $24, %rsp retq

.subsections_via_symbols

which is perfect. So the bug is __builtin_ms_va_list specific. Somewhere in clang sources there's code to correctly handle variadic Msft AMD64 abi, and it's used for the legacy __builtin_va_list with Windows 64 bit -target.

llvmbot commented 8 years ago

The bug is reproducible on released version of LLVM clang 3.8.0, downloadable as Mac OS X binary from this page http://llvm.org/releases/download.html

It generates the same code as Apple clang 703.0.31 (the one shipped with Xcode 7.3.1).

So the diagnostic in trunk is a later addition, possibly after this bug was filed as a substitute measure to prevent the generation of wrong code.

Fully supporting Msft AMD64 ABI in this context is not trivial because...

STDC allows enum, union and struct to be passed as unnamed arguments to variadic functions.

Msft AMD64 rules for passing args on the stack are as follows...

I've ignored the rules for register-passing, as those are moot once the registers are spilt to stack.

I believe as baseline implementation, at least following types should be extractible using __builtin_va_arg when applied to an ms_va_list

Because these scalar types are all serializable by printf and printf-like functions.

Also, I think there's some difference in the implementation of builtin_va_list and builtin_ms_va_list when using -target x86_64-pc-win32-.... In the former, the va_list is passed-by-reference similar to a sys_va_list, while latter is passed-by-value the same as Msft does in their implementation. In GCC for instance, __builtin_ms_va_list, behaves the same as __builtin_va_list if -mabi=ms is set on the command line. The is not a bug, just a compatibility consideration vs Msft.

llvmbot commented 8 years ago

I've been able to reproduce this with trunk LLVM (by having Apple Clang produce LLVM IR to feed to trunk llc), but, just so you know, trunk Clang actually refuses to compile this (cf. r261717):

$ clang --version clang version 3.9.0 Target: x86_64-apple-darwin15.5.0 Thread model: posix InstalledDir: /usr/local/bin $ clang -S -Os -fno-unwind-tables -target x86_64-pc-win32-macho va_sum.c va_sum.c:8:12: error: cannot compile this va_arg expression yet sum += __builtin_va_arg(ap, int); ^~~~~~~~~ 1 error generated.

I'll see about restoring the old behavior for this target so you can test whatever fix I end up producing.

In fact, correct me if I'm wrong, but, looking at the source (cf. lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1850), this seems to be a long-standing bug in the way SDAG expands va_arg. (The code would have my name on it if you were to run a blame, but I refactored this code into a method so the Win64 lowering could use it.) As you pointed out, the pointer is incremented by sizeof(int)--or whatever data type was extracted from the va_list. With C/C++ code on 32-bit x86, this worked, because the language specs say that integer types smaller than int get promoted to int, while float types smaller than a double get promoted to double, and the minimum stack slot size on x86 is 4 bytes. x86-64 breaks this assumption--the minimum is now 8 bytes, but sizeof(int) is still 4.

So, instead, SDAG should be incrementing the va_list by the greater of the minimum stack slot size or the data type size. (Now that I know what's wrong, it's just a small matter of programming to fix this...)