Handle Java null pointers (and divide by zero, etc.)

dz333 commented 6 years ago

From @gharrma on February 23, 2017 4:23

Options:

Write a signal handler for SIGSEGV that (somehow) throws a NullPointerException. See here for tips.
Emit code that checks for null on each dereference, with the potential to remove redundant checks.
Just crash, and make it clear that PolyLLVM does not support NullPointerExceptions. (But even if this were ok for client code, I doubt it would be OK for the Java library...)

Copied from original issue: gharrma/polyllvm#20

dz333 commented 6 years ago

From @andrewcmyers on March 9, 2017 21:38

The interrupt approach would require modifying the ucontext structure in the signal handler to make the pc go to an exception generating code segment. This sounds doable though I don't know how nicely it plays with LLVM.

dz333 commented 6 years ago

From @andrewcmyers on March 10, 2017 0:19

Here is some C code that recovers from SIGSEGV (sort of). From LLVM code you can do better because you know the address of code. In this example we 'recover' into a function that probably has a different stack layout from main, leading to a second segmentation violation.

#include <stdio.h>
#include <signal.h>

int sawit = 0;
ucontext_t saved_ucontext;

extern int recover();
extern void action();

int main(int argc, char **argv) {
    int *x = (int *)0;

    struct sigaction sa;
    sa.sa_sigaction = action;
    sa.sa_mask = 0;
    sa.sa_flags =  SA_SIGINFO;

    sigaction(SIGSEGV, &sa, 0);

    printf("Assigning...\n");

    int y = *x;
}

void action(int sig, siginfo_t *info, void *ucontext) {
    sawit = 1;
    ucontext_t *u =  (ucontext_t *)ucontext;
    saved_ucontext = *u;
    u->uc_mcontext->__ss.__rip = (unsigned long long)&recover; // XXX machine-dependent
}

int recover() {
    printf("Recovered from SIGSEGV: sawit = %d\n", sawit);
    return 0;
}

dz333 commented 6 years ago

From @andrewcmyers on March 10, 2017 0:24

Note that the actual signal handler generated would need to map from the current pc (__rip) to the desired pc for handling the exception. Maybe there is a way to reuse the existing exception machinery for this?

dz333 commented 6 years ago

From @gharrma on March 10, 2017 1:28

Interesting code snippet! Something like that might work, though my understanding isn't deep enough yet to know how to get from the signal handler to a NullPointerException that can be caught successfully. (Is it possible to just recover into a function (compiled from Java) that throws the NullPointerException explicitly?)

In case it's useful, here's an example I made where recovering to main succeeds.

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>

jmp_buf recover;

void handler(int sig, siginfo_t *info, void *ucontext) {
    printf("Handling SIGSEGV\n");
    longjmp(recover, 1);
}

int main(int argc, char **argv) {
    struct sigaction sa;
    sa.sa_sigaction = handler;
    sa.sa_mask = 0;
    sa.sa_flags =  SA_SIGINFO;
    sigaction(SIGSEGV, &sa, 0);

    if (setjmp(recover) != 0) {
        printf("Recovered to main!\n");
        return 0;
    }

    printf("Assigning...\n");
    int *x = (int*) 0;
    int y = *x;
}

dz333 commented 6 years ago

From @andrewcmyers on March 10, 2017 1:52

setjmp/longjmp will be too expensive to use. I think you just want to reset the pc as you return to another label within the same function, then throw the null pointer exception as you normally would. I guess there may be some issues with restoring variables into registers. But the exception mechanism should have exactly the same issues.

dz333 commented 6 years ago

From @andrewcmyers on April 25, 2017 15:45

Did we figure out how to do this?

dz333 commented 6 years ago

From @gharrma on April 25, 2017 22:18

Unfortunately not--I set aside NullPointerException for a bit while working on other parts. Should we implement the simple approach first (a check on each pointer access)?

dz333 commented 6 years ago

From @andrewcmyers on April 25, 2017 22:22

That make sense as a starting point. Probably we would want to make that an option in any case since the alternatives are not machine-independent or break Java semantics unacceptably.

dz333 commented 6 years ago

From @gharrma on February 12, 2018 2:23

Here's an update on what I know after reading about this more in my free time:

setjmp/longjmp might be cheap enough to use, as setjmp only needs to be called when entering a try block that can catch a null pointer exception. However, a larger issue is that longjmp restores registers to their state at the time of the setjmp call, thereby undoing any changes to in-register state that occurred between the beginning of the try block and the throw of the exception---a clear Java language spec violation. One solution is to mark all variables as volatile in the LLVM IR, but the performance impact of this seems significant (I would guess).
Some compilers support this handy -fnon-call-exceptions flag which allows C++ exceptions to be thrown from signal handlers(!). I actually got this to work partially with Java exceptions in PolyLLVM, but there's one core flaw: LLVM trapping instructions (such as loads/stores) have no way to specify an exception landing pad label if an exception occurs during their execution. Thus this trick only works if the null pointer exception is thrown within a callee of the function that catches it, since the call instruction which invokes the callee will specify a landing pad that the unwinder will see. People have actually proposed a change to the LLVM IR to support this, but I don't think that's been implemented yet (see the thread linked below in the next point).
Some people argue that explicit null checks are essentially free, especially after optimizations. I.e., LLVM might optimize most of the null checks away anyway. Likely the best approach is to implement explicit null checks, evaluate the performance impact, and then decide whether the performance is good enough to not worry about it.

dz333 commented 6 years ago

From @gharrma on May 24, 2018 17:40

Update: we still do not check for NullPointerExceptions, although we do print a nice message after segfaults and divide-by-zero.

If we are ok with the performance hit (which is likely to be small), it would be relatively easy to add null pointer checks in emitted code. In order to insert checks before field accesses, see ObjectStruct_c#buildFieldElementPtr. For method calls, see PolyLLVMCallExt#buildFuncPtr

polyglot-compiler / JLang

Handle Java null pointers (and divide by zero, etc.) #4