[RFC] Shared interfaces with parallelDettrace

gatoWololo commented 5 years ago

Background

@rrnewton Has proposed the idea that parallelDettrace should be decoupled from the instrumentation/interception mechanism. That is, we should be able to use either systrace, or ptrace, or any other mechanism that meets the specification of some abstract interface. For now we can assume this will be a Rust Trait (typeclass):

trait Interceptor {
  // functions for talking with interceptor
  fn getRegs(&mut Self, ...) -> Regs;
  ...
}

// ptrace backed interceptor
impl Interceptor for Ptracer {
  fn getRegs(&mut Self, ...) -> Regs {
    // call ptrace(GET_REGS)...
  }
}

// systracer backed interceptor
impl Interceptor for Systracer {
  fn getRegs(&mut Self, ...) -> Regs {
  ...
}

Then, our system call handlers to be generic over our interceptor:

fn handleFork<T: Interceptor>(inter: T) -> () {
  let regs = inter.getRegs(...);
  inter.injectSyscall(SystemCall::Fork);
  ...
}

This way, the handlers can be used for both parallelDettrace (backed by ptrace) and a system backed by systrace.

So given a non-leaky interface, which sufficiently abstracts over these details it should be possible. However, this is really tricky as there are a lot of things to worry about. Here we enumerate most:

There are multiple contexts which we wish this to run.

Here are some possible contexts: 1) parallelDettrace: Here we intercept events into a separate ptrace-based tracer. All reads/writes must done through ptrace peek/pokes or VM read/writes. 2) systrace in-process: Here we are running as part of the process. We have a restricted environment where we cannot use the C or Rust standard library, as system calls made would themselves be intercepted by systrace's seccomp mechanism. 3) systrace ptrace-tracer: It is not always possible to patch and jump into a in-process handler. So, some calls are caught by seccomp through ptrace and handled in a separate process similar to # 1.

So ideally the interface that our syscall handlers use, should abstract all these details away? Let's think about some of these in more detail:

System Call restriction

All system calls must be made through the special blessed by seccomp instruction. thus we could have a thin wrapper around calling this special instruction to do our system calls, this would mean we don't have access to entire C or Rust standard library.

One proposal is to implement our own minimal set of data structures, types, system call wrappers that we would need. I think this sounds harder than it actually is as we can Implement them as needed. These functions would then be ld_preloaded into the tracee's memory image.

Another approach would be just to work on a restricted environment. And by convention know that we shouldn't use any functions that might call a system calls, this is mainly a hindrance to the parallelDettrace implementation, as it would be unnecessarily restricted so that it easily inner operates with systrace.

Similarly, we could have abstract over the implementation a certain type that we really care about. for example say we really wanted strings, we could define either a generic type or a trait with a minimal string interface:

// Must implement Drop and Default, and ... etc.
trait GenString : Drop + Default {
  fn new() -> Self;
  fn set(&[u8]);
  fn concat(&mut self, s1: Self)
}
// enum with no variants
enum InProcess {}
enum Regular {}

// concrete implementation based on type
impl GenString<Regular> {
  // This is a regular context simply call relevant std::str or std::String function
}

// String to be used when running inside the tracee. Here we have special restrictions
// on allocations and deallocation.
impl GenString<InProcess> {
  fn new() -> Self {
    // Special context aware implementation.
  }
}

Then we could parametrize our handles by the relevant implementation:

fn handleFork<T: Interceptor, Q>(inter: T) -> () {
  let myString: GenString<Q> = Q::new();
}

// the correct implementation is picked by the caller:
handleFork<InProcess>(...);

The types get specialized by rust at compile time, but it is not clear how we can pick the correct implementation and inject it into the tracee.

There is more to determinism systems than just handlers

Consider a parallel setting where a traced process tries to create an empty directory. We have a handler for this case:

fn handleMakeDirectory() {
  // ...
}

Almost all filesystem operations are inherently non-deterministic, as we're racing with all other processes that attempt reads/write to this directory. How do we (the handler) know whether we can go now, or we need to wait for other process (for simplicity assume we're using Kendo for determining who gets to go first). We would need to query some shared state to see if we can go. The point being, while it is convenient to think we can determinize programs in terms of system call handlers, this is not true. There is a lot of other things we need to worry about: signals, scheduling, process creation and exiting, thread creation, thread exiting, etc.

In this specific scenario, to make this portable across interception mechanims we could abstract over this with a function bool ourTurn(...) with a context dependent implementation that tells us whether we can go yet. However, it seems like we're just pushing off the work to instrumentor-specific implementations.

So while we could share system call handlers (at the cost of complexity and abstraction layers). The amount of different code is bigger than the amount of like code. Similarly, all other things I mentioned above, like signals, process creation, thread handling, would all need their own implementation that can't be shared.

Shared Global and Local State

We need to carry around global state and per-process and per-thread state. Handlers query this state to know what to do. In a distributed systrace in-process setting, it's not obvious where the best place to put this state is. Just like the few examples above, we could abstract with an interface.

Generators and Futures

We would like for parallelDettrace to use either the Generator crate or std::Future. We're still deciding which to use. These libraries make it easy to write yielding code, and provide a nice programming mode for our handlers. But I'm unsure how this will translate to systrace and the restricted in-process context.

Final thoughts.

to me it seems like the amount of code that's actually worth sharing between implementations is quite minimal. While at a quick glance it may seem that the handlers should be shared among implementations, I feel like the amount of benefit we get from from sharing is smaller than the cost of complexity.

I believe the individual syscall handlers are not the bulk of the work. The algorithms and methods we used are the hard. Looking at OG detTrace this certain certainly seems to be the case, judging for the total lines of code that detTrace is.

So I would actually lean towards keeping the handlers versus systrace and parallelDettrace separate. it seems parallelDettrace is mainly the research part: that is, figure out how we can determinize parallel processes. In parallel, systrace can work figuring out some of the problems outlined above and in other RFCs. Later systrace can just copy over the methods from parallelDettrace.

I would love to be wrong, if anyone has already thought about this and figured out how it can be done.

devietti commented 5 years ago

Omar, I think you raise a lot of good points here. What resonates most for me is that parallelDT (PDT) would have to live with a restrictive environment like nostd which will mean jumping through hoops for lots of silly things, like strings and logging. While I think we could abstract away all of these differences with a sufficient number of traits, by the time we figure out all of these abstractions, we could have alternatively spent that time making non-trivial progress on PDT. For some gritty details like process and thread exit we don't even really know what the abstraction should be yet since you don't know yet how systrace will handle these things (and don't have 100% ptrace understanding either, for threads). Every time we rewrite dettrace it will get easier ;-).

I think this has been and will continue to be a useful exercise to smooth the path for porting PDT over to systrace eventually. For example, having clearly separated container-level state, process-level state, and thread-level state seems like a really good idea. Switching from a pre/post-hook mentality over to the captured_syscall interface seems like a good idea.

rrnewton commented 5 years ago

@gatoWololo @devietti -- I think these are indeed good points, but that we are still talking about these things at a pretty high level of vagueness and need to get more into the details.

First, on the big, strategic question: I believe a PDT that is completely restricted to a single ptracer is kind of an odd beast. If it's going to take a big 4X slowdown on sequential execution, then that works directly against any parallel speedup. Is it optimizing a horse and buggy, or a car?

The algorithms and methods we used are the hard [part].

Yes, of course. But all of those scheduler decisions react to events (triggering handlers), and I believe you are correct that the central issue is how much of an imposition it is to make all that code "portable" in the sense of being agnostic to whether it runs in-process or in a central daemon. But come on, we have not yet begun to fight on this front!!

You mention allocation as a reason to bifurcate code (even String implementations). No way! Just get allocation working inside the guest. @wangbj is already using statically-linked MUSL libc in the tools.
We can push on how much code we can write without bumping into unintended syscalls (@wangbj points out extra futex operations as a particular problem.) But once we hit that problem we have multiple tactics to deal with it. We could write or customize existing library code (yuck), or we can fight back directly at this problem and try to whitelist MORE syscalls (ideally all the Rust syscalls in the tool, not just some single, blessed untraced_syscall).

The amount of different code is bigger than the amount of like code.

I don't agree with this. I think we need to dig into these issues further to see if things like signals/process creation/thread handling truly "need their own implementation that can't be shared." I don't see why you conclude that.

In particular, I think the idea of "bool ourTurn(...) with a context dependent implementation that tells us whether we can go yet" is wrong. I think we have to figure out how to write code once, above the abstracted interception interface that:

(1) splits its state access cleanly into globalState, processState, threadState (+ transientState for mid-injection handlers). (2) observes some number of restrictions for portable, location-agnostic execution

But that's it. We should figure out how to avoid duplicating / doubly implementing any code whatsoever. And of course we should push on (2) and figure out how to avoid or eliminate restrictions on programming style.

rrnewton commented 5 years ago

I'd like to move this discussion to the PR here: https://github.com/iu-parfunc/reverie/pull/49

reverie-rs / reverie