Open wangbj opened 5 years ago
please note we only apply patching when the syscall and following instructions match our predefined pattern, hence, if there's no pattern match, patching would not occur
To clarify, by pattern you mean instruction patterns that can be easily patched right?
This makes write interception code cumbersome, because not all syscalls are catchable into captured_syscall function call in tracee's memory space
Because they were not patched, instead they were caught by SECCOMP which traps on a ptrace tracer?
however captured_syscall is a regular C function (written in rust), and it could use mmx/sse registers
You're worried about these registers being clovered here. Since classically we only save/restore the more common CPU registers.
allocations could be dangrous, drop (inserted by rust) could be dangerous as well, because it may call pthread_xxx, which then may call futex syscall.
So we're worried about Rust standard library doing system calls as part of the work.
however, no_std variant is a lot more difficult to write, less documented, and have less libraries and features
We would basically have to roll out our own data structures and call system calls ourselves. Granted this would be no different had we done it in C right? Assuming we don't need anything too fancy, we could insert our own mini-libc or functionality that we need. Write it once and use it everywhere? While technically unsafe, we could wrap our functions and data structures in safe interfaces.
or we could rewrite all tracees' global allocator, forcing them use the same heap preallocated by the tracer
I prefer the approach of avoiding rust stdlib all together and hand managing data structures and memory.
To clarify, by pattern you mean instruction patterns that can be easily patched right? Yes, most syscalls have ssimilar patterns, such as:
0f 05 syscall 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
Because they were not patched, instead they were caught by SECCOMP which traps on a ptrace tracer? Right
You're worried about these registers being clovered here. Since classically we only save/restore the more common CPU registers. Yes, for syscalls basically we only have to:
Of course if we have ptrace stops or can use breakpoint instruction it would be even easier. For regular function calls, rather than save caller saved registers (rax/rdi/rsi/rdx/rcx/r8/r9/r10/rbx
), we also have to save FP registers and xmm/ymm registers, there're instructions like xsave/xrstore
so it should be possible.
So we're worried about Rust standard library doing system calls as part of the work. Yes rust make that quite implicit (even more so than c++), so we need to be careful
We would basically have to roll out our own data structures and call system calls ourselves. Granted this would be no different had we done it in C right? Assuming we don't need anything too fancy, we could insert our own mini-libc or functionality that we need. Write it once and use it everywhere? While technically unsafe, we could wrap our functions and data structures in safe interfaces.
Right, with C we actually have more direct control on how the tool is linked, for rust it is harder. For instance, with C we can built libc.a from musl-libc, then link our tool with libc.a (static), then use objcopy -G<symbol_a> -G<symbol_b> ...
to control symbol visibility. with rust I've found no_std
is the only way to archive that so far. Rust does have musl
target, but it doesn't work well with cdylib
, at least with +crt-static
(for cdylib
).
I think use no_std
is a better choice too, as mentioned, it has its own downside, none the less.
forcing them use the same heap preallocated by the trace
Are you referring here to the "shared global memory" option (rather than the message-passing/RPC approach to globalState)? We have a complicated decision tree of possible futures we're considering, so good to clarify which branch we're on ;-).
because of that, we can rewrite the seccomp filters, allowing all syscalls inside tool memory range (by checking procfs)
Why is this additional "whitelisting" approach specific to no_std
only? Even if you have a tool/plugin that uses full featured libc + Rust stdlib, as long as everything is statically linked, couldn't you in principle whitelist all code inside that tool?
The prerequisite is to make sure the tool shared library is a standalone library doesn't link against any other libraries, so that everything is self contained. If the guarantee satisfies, then we know it has all its syscall
instruction self-contained as well, so that we can create a filter, allow all syscall
to be whitelisted within the tool.
It would not work if the tool linked with external library, such as glibc, because when the tool calls read@glibc
, it would escaped the whitelist, and we're not whitelisting glibc syscalls.
systrace allows using a tool shared library (tool) with
--tool
switch. A tool basically implementscaptured_syscall
C API, so after systrace successfully patched a syscall site, it can generate trampoline and can jump tocaptured_syscall
, so that we can intercerpt the original syscalls.The tool is loaded by systrace using
LD_PRELOAD
, hence it is not usable afterLD_PRELOAD
is finished. There're already about 20+ syscalls called byld-linux.so
and they're not catchable. For now this is a hard limitation, however, we can still catch them bySECCOMP
. once the tool is (LD_PRE)loaded, systrace tries to patch any syscall with predefined rules (insrc/bpf.c
). please note we only apply patching when thesyscall
and following instructions match our predefined pattern, hence, if there's no pattern match, patching would not occur. This makes write interception code cumbersome, because not all syscalls are catchable intocaptured_syscall
function call in tracee's memory space. The plan is when such case happens, we could use ptrace SECCOMP stop to injectcaptured_syscall
, forcing tracee to do this very function call. It is relatively easy to inject real syscalls, and we've done that in the past many times. howevercaptured_syscall
is a regular C function (written in rust), and it could use mmx/sse registers, hence it would be more difficult to inject it in the tracer, nonetheless, it should be possible with properxsave/xrestore
instructions.In the future, we might install a second seccomp rule in tool's init function, so that we can patch the syscall either in tracee's memory space, or intercept the syscall in
SIGSYS
signal handler, but this also have risks such as the decoding ofucontext
from the signal handler seems complicated, and redicting control flow in the same task seems more difficult than ptrace.The tool library is running in tracee's memory space, however, because we intercept raw syscall, we must be very careful to avoid dead locks. i.e.: doing allocations could be dangrous, drop (inserted by rust) could be dangerous as well, because it may call
pthread_xxx
, which then may callfutex
syscall. Even there's no dead lock, doing the extra syscalls can cause performance degration. Thus the tool must be written in a very strong constrait. We also have a choice to usestd
orno_std
. usingno_std
allows the tool not to have dependencies on any external library (including libc), because of that, we can rewrite the seccomp filters, allowing all syscalls inside tool memory range (by checking procfs). however,no_std
variant is a lot more difficult to write, less documented, and have less libraries and features.After serveral discussion, our
captured_syscall
could be look like:ProcessState
holds resources sharing among threads, such as unix file descriptor, signal handlers, etc. whileThreadState
holds resources local to any threads. The hard part is our trampoline, like a reguar syscall, doesn't know anything, except the syscall no and six arguments. We could allocateProcessState
during ptrace exec event; and allocateThreadState
both in exec event and fork/vfork/clone event. however, because the heap belongs to the tracee only, it could be quite difficult to prepare those data structures in the tracer, even with help ofSerialize/Deserialize
. It could be possible to abuse inject function calls once again, or we could rewrite all tracees' global allocator, forcing them use the same heap preallocated by the tracer. This isn't any easier by any means, i.e.: the tracer will need to expose some APIs to claim/reclaim memory to the tracees; so that tracees could use the exposed API to implements their own Global Allocator; It also seems very unsafe, because any tracee have access to the global heap, shared among the tracer and all tracees.