rdbo / libmem

Advanced Game Hacking Library for C, Modern C++, Rust and Python (Windows/Linux/FreeBSD) (Process/Memory Hacking) (Hooking/Detouring) (Cross Platform) (x86/x64/ARM/ARM64) (DLL/SO Injection) (Internal/External) (Assembler/Disassembler)
GNU Affero General Public License v3.0
738 stars 90 forks source link

UNIX: Suggestions to improve the `ptrace` component (race conditions, unexpected signals, call injection) #235

Open d0ubleday opened 3 weeks ago

d0ubleday commented 3 weeks ago

Hi, rdbo. Speaking as a full-time reverse engineer who's written an untold number of various ptrace-based solutions over the years (though not for game hacking), your project is a hidden gem.

However, I've noticed several issues with the use of ptrace in the Linux backend (possibly FreeBSD too; can't say for sure but a quick glance tells me it's similar). In particular, your code doesn't seem to handle unexpected signal delivery. For instance, in ptrace_libcall, the return of waitpid is assumed to mean a breakpoint hit, which is not necessarily the case. It could easily be any other signal sent to the tracee. Instead of using waitpid directly, consider implementing a wrapper that calls it in a loop, checks if the process is stopped with the WIFSTOPPED macro, verifies the signal, and reinjects it with PTRACE_CONT if it doesn't match. Code for reference:

int ptrace_wait(pid_t pid, int sig) {
    int wstatus;

    while (true) {
        if (waitpid(pid, &wstatus, WUNTRACED) == -1)
            return -1;

        if (!WIFSTOPPED(wstatus)) {
            /* handle tracee death */
            return -1;
        }

        int received_sig = WSTOPSIG(wstatus);

        if (received_sig == sig)
            return 0;

        /* reinject the signal and keep waiting */
        ptrace(PTRACE_CONT, pid, 0, received_sig);
    }
}

Although rare, I was unlucky to face it on several occasions, and implementing a similar check fixed the issue. Ideally, it should also be used for attaching and any restarting operation (PTRACE_CONT and PTRACE_SINGLESTEP in your case). For attaching you'd wait for a SIGSTOP, and for singlestep/breakpoints you'd wait for a SIGTRAP. I guess it would be better to just wrap them all to include the call to ptrace_wait instead of calling it manually every time.

I would also strongly suggest you freeze all threads upon attaching; otherwise there's a potential race condition. If the other threads keep running, they might hit a breakpoint before the one you've modified does. With ptrace_libcall in particular, that would result in a segfault due to an unsuspecting thread calling rax. I assume it's hard to catch in testing if all threads run their own designated functions, but that's not always the case, and I could provide a demo of the issue within a couple of days if needed. Freezing the remaining threads fixes the issue. You might have more luck doing it with a SIGSTOP, but I find it easier and more reliable to just attach to all threads instead. IIRC, in that case you'd still call waitpid only once, and not for every thread, but don't quote me on that.

As a side note, I've also noticed that ptrace_setup_libcall modifies the stack but ptrace_restore_libcall never restores it. It might not matter most of the times, but possible red zones should be taken into account, otherwise you risk inadvertently corrupting the stack frame. Consider subtracting additional 128-bytes after you align the stack pointer to play it safe.

From the System V x86-64 ABI:

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

I first thought it was alright since you do subtract 256 bytes (size of ptlib->stack), but then again you poke those very 256 bytes back, essentially overwriting the red zone with garbage from the uninitialized tracer stack:

  0 +-----------------+ <--- RSP
    | args AAAAAAAAAA |                                         
128 +-----------------+ <--- Red zone end
    | AAAAAAAAAAAAAAA |                                              
256 +-----------------+ <--- Red zone start, original RSP

To be fair, I've never seen crashes related to this particular issue in practice, but it just kinda makes sense, and I guess I could also try to simulate it and post a demo if needed.