llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.96k stars 11.54k forks source link

Usage of a copy of a register just after a mov instruction #18979

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 18605
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @atrick,@chandlerc,@echristo,@hfinkel

Extended Description

Using a copy of a register just after a mov instruction may cost an extra cycle on all but the very newest x86 processors (4th generation Intel Core). On superscalar processors a mov instruction and a modification of the source register thereof can be executed in parallel. Typically for every line of the C code below we can reduce the number of used cycles from 3 to 2 (measured on e.g. Intel i7 920, Intel Atom N450). Even if the "correct order" is given the "wrong order" is produced. Interestingly both GCC and ICC also show this strange behavior; is there any reason to do it this way?

int test(int x) { int y; x ^= (x >> 2); x = (x >> 3) ^ x; x = x ^ (x >> 4); y = x; x >>= 5; x ^= y; // almost the same but explicit return x; } => movl %edi, %eax sarl $2, %eax // => sarl $2, %edi xorl %edi, %eax movl %eax, %ecx sarl $3, %ecx // => sarl $2, %eax xorl %eax, %ecx movl %ecx, %edx sarl $4, %edx // => sarl $2, %ecx xorl %ecx, %edx movl %edx, %eax sarl $5, %eax // => sarl $2, %edx xorl %edx, %eax retq

llvmbot commented 10 years ago

I would be glad if such a peephole optimization were present. IMHO it should be active as default at least for x86.

hfinkel commented 10 years ago

It seems like it would be easy enough to adjust for this use-after-copy hazard, for those targets that need it, with a simple late MI pass.

atrick commented 10 years ago

I think the "problem" here is that the 2-address pass does not know how to stretch the source live range of a copy. A use of the copy that immediately follows always ends of using the result of the copy. I think there are cases where it would be useful to use the copy's source instead and reduce the critical path. However, in general we would avoid doing that because it potentially increases register pressure/constraints scheduling.

So it's tricky, and there is no motivation for improving it when the optimization target makes copies free.