[x86-64] optimize va_start when there are no XMM vaarg calls

EdSchouten commented 14 years ago


Bugzilla Link	6796
Version	trunk
OS	FreeBSD
Depends On	llvm/llvm-project#2112
CC	@lattner,@sunfishcode

Extended Description

The following code generates very different code when using GCC 4.2.1/Clang SVN. The generated code also makes little sense.

int foo(int a, ...) { int r; __builtin_va_list va;

__builtin_va_start(va, a);
r = __builtin_va_arg(va, int);
__builtin_va_end(va);
return (r);

}

GCC:

0000000000000000 : 0: 48 83 ec 60 sub $0x60,%rsp 4: 48 8d 44 24 68 lea 0x68(%rsp),%rax 9: 48 89 74 24 b0 mov %rsi,0xffffffffffffffb0(%rsp) e: c7 44 24 88 10 00 00 movl $0x10,0xffffffffffffff88(%rsp) 15: 00 16: 48 89 44 24 90 mov %rax,0xffffffffffffff90(%rsp) 1b: 48 8d 44 24 a8 lea 0xffffffffffffffa8(%rsp),%rax 20: 48 89 44 24 98 mov %rax,0xffffffffffffff98(%rsp) 25: 48 83 c0 08 add $0x8,%rax 29: 8b 00 mov (%rax),%eax 2b: 48 83 c4 60 add $0x60,%rsp 2f: c3 retq

Clang:

0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 50 sub $0x50,%rsp 8: 84 c0 test %al,%al a: 74 26 je 32 <foo+0x32> c: 0f 29 85 60 ff ff ff movaps %xmm0,0xffffffffffffff60(%rbp) 13: 0f 29 8d 70 ff ff ff movaps %xmm1,0xffffffffffffff70(%rbp) 1a: 0f 29 55 80 movaps %xmm2,0xffffffffffffff80(%rbp) 1e: 0f 29 5d 90 movaps %xmm3,0xffffffffffffff90(%rbp) 22: 0f 29 65 a0 movaps %xmm4,0xffffffffffffffa0(%rbp) 26: 0f 29 6d b0 movaps %xmm5,0xffffffffffffffb0(%rbp) 2a: 0f 29 75 c0 movaps %xmm6,0xffffffffffffffc0(%rbp) 2e: 0f 29 7d d0 movaps %xmm7,0xffffffffffffffd0(%rbp) 32: 4c 89 8d 58 ff ff ff mov %r9,0xffffffffffffff58(%rbp) 39: 4c 89 85 50 ff ff ff mov %r8,0xffffffffffffff50(%rbp) 40: 48 89 8d 48 ff ff ff mov %rcx,0xffffffffffffff48(%rbp) 47: 48 89 95 40 ff ff ff mov %rdx,0xffffffffffffff40(%rbp) 4e: 48 89 b5 38 ff ff ff mov %rsi,0xffffffffffffff38(%rbp) 55: 48 8d 85 30 ff ff ff lea 0xffffffffffffff30(%rbp),%rax 5c: 48 89 45 f8 mov %rax,0xfffffffffffffff8(%rbp) 60: 48 8d 45 10 lea 0x10(%rbp),%rax 64: 48 89 45 f0 mov %rax,0xfffffffffffffff0(%rbp) 68: c7 45 ec 30 00 00 00 movl $0x30,0xffffffffffffffec(%rbp) 6f: c7 45 e8 08 00 00 00 movl $0x8,0xffffffffffffffe8(%rbp) 76: 48 63 45 e8 movslq 0xffffffffffffffe8(%rbp),%rax 7a: 48 83 f8 28 cmp $0x28,%rax 7e: 77 0f ja 8f <foo+0x8f> 80: 48 89 c1 mov %rax,%rcx 83: 48 03 4d f8 add 0xfffffffffffffff8(%rbp),%rcx 87: 83 c0 08 add $0x8,%eax 8a: 89 45 e8 mov %eax,0xffffffffffffffe8(%rbp) 8d: eb 0c jmp 9b <foo+0x9b> 8f: 48 8b 4d f0 mov 0xfffffffffffffff0(%rbp),%rcx 93: 48 8d 41 08 lea 0x8(%rcx),%rax 97: 48 89 45 f0 mov %rax,0xfffffffffffffff0(%rbp) 9b: 8b 01 mov (%rcx),%eax 9d: 48 83 c4 50 add $0x50,%rsp a1: 5d pop %rbp a2: c3 retq

sunfishcode commented 14 years ago

For this to happen, we first need to get the front-ends using the va_arg instruction. For that to happen, CodeGen needs to fully support the va_arg instruction on all important targets. For that to happen, we need target-independent support for va_arg with aggregate types, and target-dependent support for lowering va_arg for all the important targets.

Patches welcome.

lattner commented 14 years ago

This is aka rdar://7832354

lattner commented 14 years ago

The code sequence is the generic code sequence that is needed if you do a vector of fp va_arg. GCC has an optimization pass that scans a function to see if the va_list provably doesn't escape and if there are no fp accesses.

This impacts stuff like the implementation of the open syscall.

llvm / llvm-project

[x86-64] optimize va_start when there are no XMM vaarg calls #7168

Extended Description