kallsyms / warpspeed

macOS Record Replay Debugger
4 stars 0 forks source link

Dealing with Direct Syscalls (e.g. something from JIT) #19

Open pmarkowsky opened 1 year ago

pmarkowsky commented 1 year ago

We should have a strategy for dealing with direct syscalls / invocations of SVC.

This paper https://www.usenix.org/system/files/conference/woot16/woot16-paper-spisak.pdf which used PMUs for a rootkit. Says that instruction skid from setting up PMUs to trap on SVC instructions is only about 1000 instructions or less. With 99% being right at the beginning of Syscalls.

pmarkowsky commented 1 year ago

Ok, so looked around at DBI solutions tonight while trying to sort out linking issues for the syscall interceptor.

So I think the way to do this would be to use a DBI like tinyinst and limit it to only the main executable being debugged. This is probably slide worthy.

kallsyms commented 1 year ago

hopefully search is good enough to find this since i don't see a better issue to tag this under.

measured the "round trip" time in simplevm to go from beginning of hypervisor exception handling, back to guest smc, back to hypervisor handling. this ends up being pretty consistently ~60 24mhz clock ticks = (1s/24,000,000)*60 = 0.0000025s = 2.5us. this is significantly better than expected... for reference, the following simple program which times how long a nonexistant syscall takes to just ret back to userland clocks in at ~40 ticks per iteration, so hypervisor overhead is not even double of what is in theory the most minimal syscall possible.

#include <stdio.h>
#include <stdint.h>

#include <sys/syscall.h>
#include <unistd.h>

int main() {
    uint64_t last = 0;

    for (int i =0; i < 1000; i++) {
        uint64_t val;
        asm volatile("mrs %0, CNTPCT_EL0"
                     : "=r"(val));
        syscall(8);  // shouldn't exist, probably quickest way to bounce back to userland
        printf("%llu\n", val - last);
        last = val;
    }
    return 0;
}
kallsyms commented 1 year ago

... just realized i had also left in some other printing (and reg fetching) in the hypervisor example. with that removed it's down to 30 ticks. which now makes me question the above code since that takes longer but maybe xnu's enosys handling is slow? either way, it's about 1us now still including the necessary PC get and set to jump to the next insn.

pmarkowsky commented 1 year ago

https://developer.apple.com/documentation/hypervisor#3627361

Would be my guess as to why.

On Thu, May 18, 2023 at 6:14 PM Nick Gregory @.***> wrote:

... just realized i had also left in some other printing (and reg fetching) in the hypervisor example. with that removed it's down to 30 ticks. which now makes me question the above code since that takes longer but maybe xnu's enosys handling is slow? either way, it's about 1us now.

— Reply to this email directly, view it on GitHub https://github.com/kallsyms/mrr/issues/19#issuecomment-1553729295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOEERGNFYGA5H7D2MN7CBTXG2NLZANCNFSM6AAAAAAYAAART4 . You are receiving this because you authored the thread.Message ID: @.***>