It is worthwhile to support patching syscall site beyond PC_RELA 4GB, especially when PIE (position independent executable) is enabled (and it is with GCC on ubuntu 18.04), the program code region can be load to relatively high address.
We used below approach:
1) load syscall trampoline by using LD_PRELOAD, the trampoline contains several different kinds of syscall entries when patching is possible.
2) When we found a patchable syscall site, we patch it by replacing the syscall instruction, as well as following 3-byte (5-byte total) with a callq <PC_RELA_4GB> instruction; PC_RELA_4GB might not able to reach the LD_PRELOAD address
3) As a result, we search free pages (/proc/<pid>/maps), and allocate page(s) which hold another temporarily stub, and set PC_RELA_4GB to this stub
4) The stub could be as simple as:
callq LD_PRELOAD_TRAMPOLINE_ENTRY
ret
5) because we have several (N) trampoline entries, we also need N stubs, and we can know which stub is needed to jump into the right trampoline entry by pattern matching instructions after the very syscall instruction we're trying to patch
The changes have been pushed from 05b339f21be28fd55c8e1684a844305dcd2fcb7a to 1aa5cd80046bbf81770f7cd96ab5c1134e0aae2a. created this issue for documenting purpose.
It is worthwhile to support patching syscall site beyond PC_RELA 4GB, especially when PIE (position independent executable) is enabled (and it is with GCC on ubuntu 18.04), the program code region can be load to relatively high address.
We used below approach:
1) load syscall trampoline by using
LD_PRELOAD
, the trampoline contains several different kinds of syscall entries when patching is possible. 2) When we found a patchablesyscall
site, we patch it by replacing thesyscall
instruction, as well as following 3-byte (5-byte total) with acallq <PC_RELA_4GB>
instruction;PC_RELA_4GB
might not able to reach theLD_PRELOAD
address 3) As a result, we search free pages (/proc/<pid>/maps
), and allocate page(s) which hold another temporarilystub
, and setPC_RELA_4GB
to thisstub
4) Thestub
could be as simple as:5) because we have several (
N
) trampoline entries, we also needN
stubs, and we can know whichstub
is needed to jump into the right trampoline entry by pattern matching instructions after the verysyscall
instruction we're trying to patch