Closed tkchia closed 1 year ago
Hello @tkchia,
Agreed this is a pretty brilliant way of handling the problem of UPX's requirement, without adding overhead.
I am curious though: what exactly does rep ret
do on real hardware, and why does UPX just happen to use it?
Thank you!
Hello @ghaerr,
what exactly does
rep ret
do on real hardware, and why does UPX just happen to use it?
rep ret
mainly does the same thing as a ret
. It was reportedly useful for working around a performance issue on some older AMD processors. GCC will actually emit rep ret
under certain conditions:
$ cat test5.c
#include <inttypes.h>
int y;
uint8_t
f (void)
{
int x = y;
return x == 2 ? 1 : x * x;
}
$ gcc -S -O3 test5.c -fno-pic -static -fomit-frame-pointer -mtune=k8
$ cat test5.s
.file "test5.c"
.text
.p2align 4
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
endbr64
movl y(%rip), %edx
movl $1, %eax
cmpl $2, %edx
je .L1
movl %edx, %eax
imull %edx, %eax
.L1:
rep ret
.cfi_endproc
...
The UPX decompression routines are partly hand-coded in assembly, but presumably the authors also had this AMD performance issue in mind.
Thank you!
Hello @tkchia,
Thank you for your explanation, quite interesting. It's kind of amazing what the CPU designers are doing for branch prediction optimizations and CPU pipelines after the CISC instruction set is defined. However, I am coming closer to the realization that nothing should surprise me when it comes to the (over)complexity of AMD and Intel x86 processors!!
This allows UPX's (https://upx.github.io/) x86-64 unpacker routines to work properly. UPX relies on a callee being able to return a carry flag to its caller, and also just happens to use
rep ret
.