svaarala / duktape

Duktape - embeddable Javascript engine with a focus on portability and compact footprint
MIT License
5.96k stars 516 forks source link

Error thrown in try/finally with no catch causes internal error #287

Closed fatcerberus closed 9 years ago

fatcerberus commented 9 years ago
try { throw new Error("A pig ate everyone at 8:12"); }
finally { print("*munch*"); }

The finally runs, but leads to "internal error in bytecode executor longjmp handler". What's odd is that I can't always reproduce it. Depending on the surrounding lines, it sometimes works correctly.

svaarala commented 9 years ago

Is this in master? I fixed a possibly related issue a while ago: #268.

svaarala commented 9 years ago

I can't reproduce this in master with the example code (with assertions enabled and valgrind):

$ ./duk test.js
*munch*
Error: A pig ate everyone at 8:12
    global test.js:1 preventsyield
error in executing file test.js
fatcerberus commented 9 years ago

Hm, I'll see if I can come up with a test case later.

fatcerberus commented 9 years ago

Out of curiosity what is preventsyield? I see it on a lot of the duk output you've posted, but never figured out what it meant.

fatcerberus commented 9 years ago

I still can't manage to reproduce it in isolation, sadly. :( However, I did compare the debug output between when it works and when it doesn't.

Correct behavior (debug output starting from point of throw):

*munch*
[D] duk_js_executor.c:1342 (unknown): -> throw propagated up to entry level, rethrow and exit bytecode executor
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 7)

Error: A pig ate everyone at 8:12

When the internal error shows up:

*munch*
[D] duk_js_executor.c:1387 (unknown): caught unknown longjmp type 0, treat as internal error
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 66)

Error: internal error in bytecode executor longjmp handler

This is unrelated to my recent debugger changes for THROW, as it happens in a clean master as well.

svaarala commented 9 years ago

I'll recheck the try-catch diff - it's likely related to that somehow.

If you didn't already do so, could you enable -DUK_OPT_ASSERTIONS and see if that turns up anything?

fatcerberus commented 9 years ago

Nope, no failed assertions.

fatcerberus commented 9 years ago

For what it's worth, the bug happens with the test code above inside of a protected call (duk_pcall()).

svaarala commented 9 years ago

Ok, so if "munch" gets printed, it's probably ENDFIN which causes the error.

svaarala commented 9 years ago

A little bit more digging. Here's the bytecode:

; c0 "Error"
; c1 "A pig ate everyone at 8:12"
; c2 "print"
; c3 "*munch*"

00000 000000f3   LDUNDEF    r0
00001 00000033   NOP       
00002 000040b2   TRYCATCH   0x2, r1          ; have_finally
00003 00000073   INVALID    0
00004 8000016e   JUMP       10 (+6)
00005 000000ce   GETVAR     r3, c0
00006 00004102   LDCONST    r4, c1
00007 0080c009   NEW        r3, 1
00008 0000c573   THROW      r3
00009 000004b3   ENDTRY    
00010 004080d2   CSVAR      r3, c2
00011 0000c142   LDCONST    r5, c3
00012 0080c030   CALL       0x0, r3, 1
00013 0000c000   LDREG      r0, r3
00014 00004533   ENDFIN    
00015 000000ef   RETURN     0x3, r0          ; fast_return, have_retval

The THROW at PC 8 will eventually call longjmp with "lj_type" set to THROW. This should get passed to the finally part starting at PC 10, in registers R1 (longjmp value, i.e. the error) and R2 (longjmp type). The base register for this (R1) is defined in TRYCATCH at PC 2.

When the finally part is finished, ENDFIN is supposed to rethrow the previous error (here a THROW) at PC 14. To do this it expects to find the value to throw in R1 and its type in R2. For some reason the type in R2 is not correct here which then causes the longjmp handler to fail.

svaarala commented 9 years ago

Could you change the following snippet in duk_js_executor.c:

To:

DUK_D(DUK_DPRINT("ENDFIN: completion value=%!T, type=%!T",
      (duk_tval *) (thr->valstack + cat->idx_base + 0),
      (duk_tval *) (thr->valstack + cat->idx_base + 1)));

and get the debug log when the bug is triggered?

fatcerberus commented 9 years ago

Okay, I'll try that in a bit. What's that INVALID opcode in there?

svaarala commented 9 years ago

It's the CATCH jump slot for TRYCATCH, but because there's no catch clause it is emitted as INVALID.

fatcerberus commented 9 years ago
*munch*
[D] duk_js_executor.c:4125 (unknown): ENDFIN: completion value={message/11:"A pig ate everyone at 8:12"/3}, type=0
[D] duk_js_executor.c:1387 (unknown): caught unknown longjmp type 0, treat as internal error
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 68)

Error: internal error in bytecode executor longjmp handler
svaarala commented 9 years ago

Hum. So the value being thrown is correct but its longjmp type is not.

svaarala commented 9 years ago

(Both call and error handling are due for a cleanup and partial rewrite... Hopefully I'll soon have time for it :-)

fatcerberus commented 9 years ago

I don't know if it helps at all, but here's the surrounding code for context:

    // set up the beta test harness
    TestHarness.initialize();

    // show the title screen and start the game!
    if (!DBG_DISABLE_TITLE_CARD) {
        mini.BGM.push('BGM/SpectaclesTheme.ogg');
        ShowLogo('Logos/TitleCard.png', 5.0);
    }
    var session = new TitleScreen('SpectaclesTheme').show();

    try { throw new Error("A pig ate everyone at 8:12"); }
    finally { print("*munch*"); }

    analogue.getWorld().session = session;
    LucidaClock.initialize();
    MapEngine('Testville.rmp', 60);

If I move the try/finally literally anywhere else but after var session = ..., it works. In that exact spot the bug happens.

svaarala commented 9 years ago

Btw, what happens if you add a dummy catch clause like:

    try { throw new Error("A pig ate everyone at 8:12"); }
    catch (e) { throw e; }
    finally { print("*munch*"); }
fatcerberus commented 9 years ago

What happens is that the bug goes away. :)

svaarala commented 9 years ago

Another attempt to corner this problem: do you get the same problem with Duktape 1.2.3 (if that's feasible to test)?

svaarala commented 9 years ago

It would also be useful to see what the longjmp type looks when the finally part begins execution (it's wrong at ENDFIN). If you could change the debug level here:

to:

DUK_D(DUK_DPRINT("writing catch registers: idx_base=%ld -> %!T, idx_base+1=%ld -> %!T",
                 (long) thr->catchstack[cat_idx].idx_base,
                 (duk_tval *) &thr->heap->lj.value1,
                 (long) (thr->catchstack[cat_idx].idx_base + 1),
                 (duk_tval *) &thr->heap->lj.value2));

(Sorry for pestering you with this - I'd test it myself if I could reproduce it :-)

fatcerberus commented 9 years ago

No problem, it's probably some bizarre corner case that will never actually come up in the real world (I only found it by screwing around), but I can say from experience that those are usually the hardest bugs to fix. :(

Anyway, output:

[D] duk_js_executor.c:654 (unknown): writing catch registers: idx_base=6 -> {message/11:"A pig ate everyone at 8:12"/4}, idx_base+1=7 -> unused
svaarala commented 9 years ago

Darn, that debug print didn't actually include the relevant value. Could you add this debug print right after the above one:

DUK_D(DUK_DPRINT("ljtype is: %d", (int) thr->heap->lj.type));
svaarala commented 9 years ago

Yeah, the hardest bugs are those context sensitive bugs which are often not caught by even large test sets. For example, the try-finally bug reported earlier was not triggered by any existing test in the Duktape test set, test262, when running Emscripten generated code etc.

fatcerberus commented 9 years ago

"ljtype is: 2"

svaarala commented 9 years ago

Ok, now we just need to figure where it gets overwritten :-) ... Thanks, I'll do some code reading later tonight and hopefully pinpoint this :)

svaarala commented 9 years ago

Oh btw, was it possible to reproduce this in 1.2.3 or do you have code that relies on newer features not present in that version?

fatcerberus commented 9 years ago

I should be able to drop in Duktape 1.2.3 to test, yes. I haven't really made use of the 1.3 additions yet.

fatcerberus commented 9 years ago

The bug is indeed reproducible in 1.2.3.

svaarala commented 9 years ago

Just to be clear, does this happen every time when the try-finally is in that specific position, or just some of the time?

It seems unlikely this could be an application bug, but just in case, are there any signs of memory unsafe behavior e.g. in the native calls preceding the problem (any crashes etc)?

fatcerberus commented 9 years ago

It's 100% reproducible with the try-finally in that exact location. If I move it elsewhere (either up or down), it works again.

And I haven't noticed any memory corruption or crashes, no.

I'd debug it myself, but MSVC apparently doesn't like the single-file Duktape source, it won't step into Duktape calls and breakpoints set inside Duktape aren't honored.

Would it help if I were to dump the bytecode? You'd have to tell me how to do that...

svaarala commented 9 years ago

Not sure if the bug is bytecode-related but that would surely help.

If you can find a reference to the function somehow and you could use duk_dump_function() on it, I can extract the bytecode from the dump.

fatcerberus commented 9 years ago

Okay, here's the dump, obtained by calling duk_dump_function() and writing out the returned buffer: https://drive.google.com/open?id=0BxPKLRqQOUSNUEhQYUozWTU5aWc

svaarala commented 9 years ago

Here's the relevant bit, doesn't seem much wrong with it.

00087 00000033   NOP       
00088 000040b2   TRYCATCH   0x2, r1          ; have_finally
00089 00000073   INVALID    0
00090 8000016e   JUMP       96 (+6)
00091 000800ce   GETVAR     r3, c32
00092 00084102   LDCONST    r4, c33
00093 0080c009   NEW        r3, 1
00094 0000c573   THROW      r3
00095 000004b3   ENDTRY    
00096 004880d2   CSVAR      r3, c34
00097 0008c142   LDCONST    r5, c35
00098 0080c030   CALL       0x0, r3, 1
00099 00004533   ENDFIN

Side note: there isn't a bytecode viewer as such but you can get the above by running:

$ ./duk --debugger bytecode.duk

and then using the debugger web UI: attach, show bytecode.

svaarala commented 9 years ago

This seems pretty hard to pin down, I've read through the relevant code paths and can't figure what could be wrong here.

At this point it would be useful to know what R1 and R2 hold at PC 96 (start of finally handling). The debug dump in "handle catch or finally" indicates we had the correct longjmp type there, but it'd be nice to be sure it's still visible at that point (after potential call stack, catch stack, and value stack resizes). I'll suggest a debug print for that shortly.

Do you use threading and/or is there any risk of a "recursive" call into a different Duktape context (but same heap) in TitleScreen() or TitleScreen.show()? A non-allowed recursive Duktape call might have weird side effects, but that doesn't seem likely here.

svaarala commented 9 years ago

If you insert this to the top of the bytecode executor before 'ins' is read (https://github.com/svaarala/duktape/blob/master/src/duk_js_executor.c#L2275) we might get useful info:

if (act->pc == 96 && fun->nregs >= 3) {
    DUK_D(DUK_DPRINT("R1=%!T R2=%!T", DUK__REGP(1), DUK__REGP(2)));
}
fatcerberus commented 9 years ago

I'll try the debug print. minisphere is single-threaded and I only use one context throughout (the one returned by duk_create_heap()), so the risk of a recursive call causing clobbering seems low.

fatcerberus commented 9 years ago

Okay, I ended up getting back quite a few lines from that, but I assume this is the correct one as it mentions the thrown Error object:

[D] duk_js_executor.c:2270 (unknown): R1={message/11:"A pig ate everyone at 8:12"/4} R2=0
fatcerberus commented 9 years ago

Incidentally, I've never seen a %!T printf format specifier before, what exactly is it?

svaarala commented 9 years ago

Duktape has a custom debug formatter so that you can easily format duk_tvals (%!T), duk_hobjects (%!O) etc.

svaarala commented 9 years ago

Thanks for that debug print - so, R2 is zero on entry to finally so it's not related to what's done in the finally block. Do you get the same error if the finally part is empty (no print)?

fatcerberus commented 9 years ago

Yes, empty finally has the same issue.

svaarala commented 9 years ago

The rest of the bug huntery would be to figure out (1) does dukhandle_catch_or_finally() manage to write the longjmp type to thr->valstack + cat->idx_base + 1 and (2) at what point after that (probably inside dukhandle_catch_or_finally()) it gets assigned a zero.

I'll work up a branch with the necessary debug prints and provide a snapshot tomorrow.

fatcerberus commented 9 years ago

I always uncover the weirdest bugs don't I? :stuck_out_tongue_closed_eyes:

fatcerberus commented 9 years ago

So, I went ahead and did this:

    DUK_DDD(DUK_DDDPRINT("writing catch registers: idx_base=%ld -> %!T, idx_base+1=%ld -> %!T",
                         (long) thr->catchstack[cat_idx].idx_base,
                         (duk_tval *) &thr->heap->lj.value1,
                         (long) (thr->catchstack[cat_idx].idx_base + 1),
                         (duk_tval *) &thr->heap->lj.value2));

    DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));

    tv1 = thr->valstack + thr->catchstack[cat_idx].idx_base;
    DUK_TVAL_SET_TVAL(&tv_tmp, tv1);
    DUK_TVAL_SET_TVAL(tv1, &thr->heap->lj.value1);
    DUK_TVAL_INCREF(thr, tv1);
    DUK_TVAL_DECREF(thr, &tv_tmp);  /* side effects */

    DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));

    tv1 = thr->valstack + thr->catchstack[cat_idx].idx_base + 1;
    DUK_TVAL_SET_TVAL(&tv_tmp, tv1);
    DUK_TVAL_SET_NUMBER(tv1, (duk_double_t) thr->heap->lj.type);  /* XXX: set int */
    DUK_ASSERT(!DUK_TVAL_IS_HEAP_ALLOCATED(tv1));   /* no need to incref */
    DUK_TVAL_DECREF(thr, &tv_tmp);  /* side effects */

    DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));

Output was:

[D] duk_js_executor.c:656 (unknown): ljtype 2
[D] duk_js_executor.c:664 (unknown): ljtype 0
[D] duk_js_executor.c:672 (unknown): ljtype 0

So by the time the longjmp type is written to R2, it's already been reset to zero. I'm wondering if it's those "side effects" mentioned on the DECREF calls...

fatcerberus commented 9 years ago

You know, in hindsight I'm really surprised that the invalid longjmp didn't trigger an assert...

fatcerberus commented 9 years ago

I did it! Here's a minimal test case to reproduce the bug:

function Food() {
    this.getEaten = function() { print("The pig ate all the food, and then..."); };
}
Duktape.fin(Food.prototype, function() {})
new Food().getEaten();
try { throw new Error("A pig ate everyone at 8:12"); }
finally { print("*munch*"); }

It appears to be an interaction between the try-catch handler and finalization. That TitleScreen class doesn't have a finalizer, but it creates native objects which do.

svaarala commented 9 years ago

Awesome, now I'll be able to work on this much more easily :-)

svaarala commented 9 years ago

The side effects won't write to value stack as such; what "side effects" mean there is that an arbitrary amount of other code may run, which may cause reallocation of value stack, catch stack, call stack, etc, so any pointers may become invalid.

But in this case I think what happens is that the longjmp type gets changed because a DECREF triggers finalization, and the call handling related to finalization will overwrite the (global) longjmp state. I'll check all the longjmp related DECREF sites; probably this isn't an issue elsewhere. If so, the simple fix is to make a local copy of the longjmp type before operating with side effects.

svaarala commented 9 years ago

Here's the minimal fix branch ('ll go through the other call sites before merging):

And here's a snapshot if you want to try it out:

It seems to fix the regression test case anyway :)