Make `IA2_DEBUG` more useful and on by default

ayrtonm commented 1 week ago

Given that the initial and target compartments of every call gate are known statically, we don't need to check the PKRU in properly compartmentalized programs. For cases that want the extra security of runtime PKRU checks in the callgates we had -DIA2_DEBUG=1 (disabled by default) and the call gates conceptually look like this

#define CALLER_PKRU N

int call_gate_for_foo(int x) {
#if defined(IA2_DEBUG)
    assert(current_pkru == CALLER_PKRU);
#endif
    switch_to_compartment_with_foo();
    int result = foo(x);
    switch_to_caller_compartment();
    return result;
}

However the exact details of control-flow are usually not be immediately obvious after inserting call gates. Often while debugging compartmentalized code it's easy to end up in a state with a partially-compartmentalized program and IA2_DEBUG can be helpful for debugging this kind of issue. Currently the PKRU assertion just does a rdpkru and ud2 if the result doesn't match the expected value. That means the process ends with a SIGILL at that point.

This is better than letting it continue and ending with a SIGSEGV later on because it's easier to see that the problem is control-flow not missing memory permissions. It points to a problem earlier on though so I want to make the assertion more useful by having it print a backtrace (this is easy to do manually in gdb but gets a little annoying when processes fork). That would make our tests easier to debug while working on the runtime, but I wanted to see if @kkysen @randomPoison had suggestions about what else might be helpful here. Also I think since this is a debugging feature now it may be useful to add an additional assertion after calling the call gate's target function.

ayrtonm commented 1 week ago

Unlike #434 we can actually use backtrace(3) in this case so getting a WIP was easier than from a signal handler. The trace ends at compartment boundaries so I still need to fix that.

edit: Actually I'm not sure if backtrace(3) supports unwinding across PLT so we may need to pull in LLVM libunwind.

randomPoison commented 1 day ago

Can you elaborate on what kind of error this will catch? I'm not sure I ran into this case with zlib so I don't know how this would help.

This is better than letting it continue and ending with a SIGSEGV later on because it's easier to see that the problem is control-flow not missing memory permissions.

What kind of control flow problem are you referring to here?

ayrtonm commented 1 day ago

By control flow I was mostly referring to missing call gates (either due to weird interactions between flags we didn't expect, missing some sources or sometimes when skipping rewriting a subset of the sources). If there's a compartment transition that's missing a call gate the program will keep running, but the thread PKRU won't have the value we expect. If the thread then leaves that compartment by going through another call gate then that ASSERT_PKRU will fail. If instead the thread leaves the compartment by just returning this won't catch the issue though.

More generally this could also catch errors like having a call gate where we don't expect one or where a call gate is calling the wrong thing. The former could happen when interfacing with libc (e.g. #427). The latter should really only happen when working on the runtime/rewriter itself, but I think it is possible if -Wl,--wrap flags are missing so there might be other flags that could cause this.

immunant / IA2-Phase2

Make `IA2_DEBUG` more useful and on by default #452