Open llvmbot opened 13 years ago
r132900 helps a bit:
__ZN2js9InterpretEP9JSContextPNS_10StackFrameEjNS_10InterpModeE: pushq %rbx subq $14112, %rsp ## imm = 0x3720
Another problem is PHI elimination. It creates extra join registers that are also spilled. This could be improved in the common case where there is no critical edges.
Clang is trying to be clever and sometimes passes temporaries between blocks in registers, even creating some phis.
RAFast spills all global live ranges, so it isn't really helping.
Extended Description
I know that the main objective of the fast register allocator is speed, but I am getting test failures -O0 because the tests run out of stack space :-(
The results I got for the largest function by recompiling the .ii with gcc and clang:
clang O0 0x00004690 gcc O0 0x00001560
clang O1 0x00000868 gcc O1 0x00000b98
clang O2 0x00000aa8 gcc O2 0x00000bd8
clang O3 0x00000ab8 gcc O3 0x00000bd8
clang Os 0x000010b8 gcc Os 0x00000698
I then found that most of the -O0 to -O1 difference was because of the register allocator:
$ llc jsinterp.bc -o jsinterp.o -filetype=obj -regalloc=greedy -O0 $ otool -t -v jsinterp.o | grep -A 8 __ZN2js9InterpretEP9JSContextPNS_10StackFrameEjNS_10InterpModeE | grep sub.*rsp
000000000000001a subq $0x00001c78,%rsp
$ llc jsinterp.bc -o jsinterp.o -filetype=obj -regalloc=fast -O0 $ otool -t -v jsinterp.o | grep -A 8 __ZN2js9InterpretEP9JSContextPNS_10StackFrameEjNS_10InterpModeE | grep sub.*rsp
0000000000000010 subq $0x000045d8,%rsp