Open wangbj opened 5 years ago
@wangbj - I need you to unpack this for me a bit further, because I don't understand why we need to ever allow the code to return to the instruction after the original syscall (PC = orig_syscall + 2
).
If we turn the very 1st attempt to execute the syscall into a trap, then the handler runs before the syscall ever gets to -- effectively a prehook. If we do ultimately execute a blocking syscall, it should be via the untraced_syscall
function right? We should always call the captured_syscall
function, irrespective of how the event was intercepted (trap or patched code site), right? There's not some way that individual syscall invocations slip through the cracks and don't get intercepted, is there? (Which would mean they can genuinely block at the syscall PC.)
In fact, I think the following theorem should hold in general:
If this theorem is false for our design (and worse, cannot be made true), then I want to understand why.
You're right, there's a bug when handling ptrace_event_exec
, the patched_syscalls
field should be zeroed, because exec*
replace the entire program's code/data. The issue you mentioned should be fixed by commit 80e47d65. But we still have the needs of patching the same syscalls repeatedly for every exec*
-ed new processes.
Wait, so does the theorem hold? It's hard for me to understand how that linked patch connects to the issue of patching blocking syscalls (which is an issue even if we never call fork/exec, right?).
I believe so, there's a patched_syscall
member for each task (or tracee), to keep record of patched syscall sites, when we exec, this field should have been cleared, because the old patched_syscall
doesn't apply to the new task (at least for now), as exec just creates a brand new context.
To elaborate we won't try to patch a syscall site, if it was recorded in patched_syscall
, hence why we see a lots of syscalls going through with secomp
instead.
I have a query which I feel is related to discussion. What would be the control flow in which handle_syscall_exit
reached?
I have a query which I feel is related to discussion. What would be the control flow in which handle_syscall_exit reached?
This is the SECCOMP
syscall exit, it is caused by call ptrace(pid, PTRACE_SYSCALL,...)
when entered SECCOMP
syscall enter stop.
With current design, if a
syscall
blocks, systrace don't patch it until it returns. The reason behind that is because if we do patch, when the originalsyscall
is blocked, after it resumes it see invalid instructions after the two-bytesyscall
instruction. best case is we getSIGILL
orSIGSEGV
, worst case it the trailthree-byte
could be a valid instruction sequence, which lead to undefined behavior.Though we still cannot patch when a
syscall
is blocked, we can however make the blocking window a lot shorter, such as modifying thesyscall
parameters, to make it non-blocking. Another approach is we can also patch certain syscalls before hand, so that we wouldn't have to worry about it later.building
glibc
can easily expose this issue: the build process seems create tons of pipes, and causes lots of blocking read/write.