Closed fatcerberus closed 9 years ago
Is this in master? I fixed a possibly related issue a while ago: #268.
I can't reproduce this in master with the example code (with assertions enabled and valgrind):
$ ./duk test.js
*munch*
Error: A pig ate everyone at 8:12
global test.js:1 preventsyield
error in executing file test.js
Hm, I'll see if I can come up with a test case later.
Out of curiosity what is preventsyield
? I see it on a lot of the duk
output you've posted, but never figured out what it meant.
I still can't manage to reproduce it in isolation, sadly. :( However, I did compare the debug output between when it works and when it doesn't.
Correct behavior (debug output starting from point of throw):
*munch*
[D] duk_js_executor.c:1342 (unknown): -> throw propagated up to entry level, rethrow and exit bytecode executor
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 7)
Error: A pig ate everyone at 8:12
When the internal error shows up:
*munch*
[D] duk_js_executor.c:1387 (unknown): caught unknown longjmp type 0, treat as internal error
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 66)
Error: internal error in bytecode executor longjmp handler
This is unrelated to my recent debugger changes for THROW, as it happens in a clean master as well.
I'll recheck the try-catch diff - it's likely related to that somehow.
If you didn't already do so, could you enable -DUK_OPT_ASSERTIONS
and see if that turns up anything?
Nope, no failed assertions.
For what it's worth, the bug happens with the test code above inside of a protected call (duk_pcall()
).
Ok, so if "munch" gets printed, it's probably ENDFIN which causes the error.
A little bit more digging. Here's the bytecode:
; c0 "Error"
; c1 "A pig ate everyone at 8:12"
; c2 "print"
; c3 "*munch*"
00000 000000f3 LDUNDEF r0
00001 00000033 NOP
00002 000040b2 TRYCATCH 0x2, r1 ; have_finally
00003 00000073 INVALID 0
00004 8000016e JUMP 10 (+6)
00005 000000ce GETVAR r3, c0
00006 00004102 LDCONST r4, c1
00007 0080c009 NEW r3, 1
00008 0000c573 THROW r3
00009 000004b3 ENDTRY
00010 004080d2 CSVAR r3, c2
00011 0000c142 LDCONST r5, c3
00012 0080c030 CALL 0x0, r3, 1
00013 0000c000 LDREG r0, r3
00014 00004533 ENDFIN
00015 000000ef RETURN 0x3, r0 ; fast_return, have_retval
The THROW at PC 8 will eventually call longjmp with "lj_type" set to THROW. This should get passed to the finally part starting at PC 10, in registers R1 (longjmp value, i.e. the error) and R2 (longjmp type). The base register for this (R1) is defined in TRYCATCH at PC 2.
When the finally part is finished, ENDFIN is supposed to rethrow the previous error (here a THROW) at PC 14. To do this it expects to find the value to throw in R1 and its type in R2. For some reason the type in R2 is not correct here which then causes the longjmp handler to fail.
Could you change the following snippet in duk_js_executor.c:
To:
DUK_D(DUK_DPRINT("ENDFIN: completion value=%!T, type=%!T",
(duk_tval *) (thr->valstack + cat->idx_base + 0),
(duk_tval *) (thr->valstack + cat->idx_base + 1)));
and get the debug log when the bug is triggered?
Okay, I'll try that in a bit. What's that INVALID opcode in there?
It's the CATCH jump slot for TRYCATCH, but because there's no catch clause it is emitted as INVALID.
*munch*
[D] duk_js_executor.c:4125 (unknown): ENDFIN: completion value={message/11:"A pig ate everyone at 8:12"/3}, type=0
[D] duk_js_executor.c:1387 (unknown): caught unknown longjmp type 0, treat as internal error
[D] duk_api_stack.c:4132 (unknown): fatal error occurred, code 100, message 'main.js' (line: 68)
Error: internal error in bytecode executor longjmp handler
Hum. So the value being thrown is correct but its longjmp type is not.
(Both call and error handling are due for a cleanup and partial rewrite... Hopefully I'll soon have time for it :-)
I don't know if it helps at all, but here's the surrounding code for context:
// set up the beta test harness
TestHarness.initialize();
// show the title screen and start the game!
if (!DBG_DISABLE_TITLE_CARD) {
mini.BGM.push('BGM/SpectaclesTheme.ogg');
ShowLogo('Logos/TitleCard.png', 5.0);
}
var session = new TitleScreen('SpectaclesTheme').show();
try { throw new Error("A pig ate everyone at 8:12"); }
finally { print("*munch*"); }
analogue.getWorld().session = session;
LucidaClock.initialize();
MapEngine('Testville.rmp', 60);
If I move the try/finally literally anywhere else but after var session = ...
, it works. In that exact spot the bug happens.
Btw, what happens if you add a dummy catch clause like:
try { throw new Error("A pig ate everyone at 8:12"); }
catch (e) { throw e; }
finally { print("*munch*"); }
What happens is that the bug goes away. :)
Another attempt to corner this problem: do you get the same problem with Duktape 1.2.3 (if that's feasible to test)?
It would also be useful to see what the longjmp type looks when the finally part begins execution (it's wrong at ENDFIN). If you could change the debug level here:
to:
DUK_D(DUK_DPRINT("writing catch registers: idx_base=%ld -> %!T, idx_base+1=%ld -> %!T",
(long) thr->catchstack[cat_idx].idx_base,
(duk_tval *) &thr->heap->lj.value1,
(long) (thr->catchstack[cat_idx].idx_base + 1),
(duk_tval *) &thr->heap->lj.value2));
(Sorry for pestering you with this - I'd test it myself if I could reproduce it :-)
No problem, it's probably some bizarre corner case that will never actually come up in the real world (I only found it by screwing around), but I can say from experience that those are usually the hardest bugs to fix. :(
Anyway, output:
[D] duk_js_executor.c:654 (unknown): writing catch registers: idx_base=6 -> {message/11:"A pig ate everyone at 8:12"/4}, idx_base+1=7 -> unused
Darn, that debug print didn't actually include the relevant value. Could you add this debug print right after the above one:
DUK_D(DUK_DPRINT("ljtype is: %d", (int) thr->heap->lj.type));
Yeah, the hardest bugs are those context sensitive bugs which are often not caught by even large test sets. For example, the try-finally bug reported earlier was not triggered by any existing test in the Duktape test set, test262, when running Emscripten generated code etc.
"ljtype is: 2"
Ok, now we just need to figure where it gets overwritten :-) ... Thanks, I'll do some code reading later tonight and hopefully pinpoint this :)
Oh btw, was it possible to reproduce this in 1.2.3 or do you have code that relies on newer features not present in that version?
I should be able to drop in Duktape 1.2.3 to test, yes. I haven't really made use of the 1.3 additions yet.
The bug is indeed reproducible in 1.2.3.
Just to be clear, does this happen every time when the try-finally is in that specific position, or just some of the time?
It seems unlikely this could be an application bug, but just in case, are there any signs of memory unsafe behavior e.g. in the native calls preceding the problem (any crashes etc)?
It's 100% reproducible with the try-finally in that exact location. If I move it elsewhere (either up or down), it works again.
And I haven't noticed any memory corruption or crashes, no.
I'd debug it myself, but MSVC apparently doesn't like the single-file Duktape source, it won't step into Duktape calls and breakpoints set inside Duktape aren't honored.
Would it help if I were to dump the bytecode? You'd have to tell me how to do that...
Not sure if the bug is bytecode-related but that would surely help.
If you can find a reference to the function somehow and you could use duk_dump_function()
on it, I can extract the bytecode from the dump.
Okay, here's the dump, obtained by calling duk_dump_function()
and writing out the returned buffer:
https://drive.google.com/open?id=0BxPKLRqQOUSNUEhQYUozWTU5aWc
Here's the relevant bit, doesn't seem much wrong with it.
00087 00000033 NOP
00088 000040b2 TRYCATCH 0x2, r1 ; have_finally
00089 00000073 INVALID 0
00090 8000016e JUMP 96 (+6)
00091 000800ce GETVAR r3, c32
00092 00084102 LDCONST r4, c33
00093 0080c009 NEW r3, 1
00094 0000c573 THROW r3
00095 000004b3 ENDTRY
00096 004880d2 CSVAR r3, c34
00097 0008c142 LDCONST r5, c35
00098 0080c030 CALL 0x0, r3, 1
00099 00004533 ENDFIN
Side note: there isn't a bytecode viewer as such but you can get the above by running:
$ ./duk --debugger bytecode.duk
and then using the debugger web UI: attach, show bytecode.
This seems pretty hard to pin down, I've read through the relevant code paths and can't figure what could be wrong here.
At this point it would be useful to know what R1 and R2 hold at PC 96 (start of finally handling). The debug dump in "handle catch or finally" indicates we had the correct longjmp type there, but it'd be nice to be sure it's still visible at that point (after potential call stack, catch stack, and value stack resizes). I'll suggest a debug print for that shortly.
Do you use threading and/or is there any risk of a "recursive" call into a different Duktape context (but same heap) in TitleScreen() or TitleScreen.show()? A non-allowed recursive Duktape call might have weird side effects, but that doesn't seem likely here.
If you insert this to the top of the bytecode executor before 'ins' is read (https://github.com/svaarala/duktape/blob/master/src/duk_js_executor.c#L2275) we might get useful info:
if (act->pc == 96 && fun->nregs >= 3) {
DUK_D(DUK_DPRINT("R1=%!T R2=%!T", DUK__REGP(1), DUK__REGP(2)));
}
I'll try the debug print. minisphere is single-threaded and I only use one context throughout (the one returned by duk_create_heap()
), so the risk of a recursive call causing clobbering seems low.
Okay, I ended up getting back quite a few lines from that, but I assume this is the correct one as it mentions the thrown Error object:
[D] duk_js_executor.c:2270 (unknown): R1={message/11:"A pig ate everyone at 8:12"/4} R2=0
Incidentally, I've never seen a %!T printf format specifier before, what exactly is it?
Duktape has a custom debug formatter so that you can easily format duk_tvals (%!T
), duk_hobjects (%!O
) etc.
Thanks for that debug print - so, R2 is zero on entry to finally so it's not related to what's done in the finally block. Do you get the same error if the finally part is empty (no print)?
Yes, empty finally has the same issue.
The rest of the bug huntery would be to figure out (1) does dukhandle_catch_or_finally() manage to write the longjmp type to thr->valstack + cat->idx_base + 1
and (2) at what point after that (probably inside dukhandle_catch_or_finally()) it gets assigned a zero.
I'll work up a branch with the necessary debug prints and provide a snapshot tomorrow.
I always uncover the weirdest bugs don't I? :stuck_out_tongue_closed_eyes:
So, I went ahead and did this:
DUK_DDD(DUK_DDDPRINT("writing catch registers: idx_base=%ld -> %!T, idx_base+1=%ld -> %!T",
(long) thr->catchstack[cat_idx].idx_base,
(duk_tval *) &thr->heap->lj.value1,
(long) (thr->catchstack[cat_idx].idx_base + 1),
(duk_tval *) &thr->heap->lj.value2));
DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));
tv1 = thr->valstack + thr->catchstack[cat_idx].idx_base;
DUK_TVAL_SET_TVAL(&tv_tmp, tv1);
DUK_TVAL_SET_TVAL(tv1, &thr->heap->lj.value1);
DUK_TVAL_INCREF(thr, tv1);
DUK_TVAL_DECREF(thr, &tv_tmp); /* side effects */
DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));
tv1 = thr->valstack + thr->catchstack[cat_idx].idx_base + 1;
DUK_TVAL_SET_TVAL(&tv_tmp, tv1);
DUK_TVAL_SET_NUMBER(tv1, (duk_double_t) thr->heap->lj.type); /* XXX: set int */
DUK_ASSERT(!DUK_TVAL_IS_HEAP_ALLOCATED(tv1)); /* no need to incref */
DUK_TVAL_DECREF(thr, &tv_tmp); /* side effects */
DUK_D(DUK_DPRINT("ljtype %u", thr->heap->lj.type));
Output was:
[D] duk_js_executor.c:656 (unknown): ljtype 2
[D] duk_js_executor.c:664 (unknown): ljtype 0
[D] duk_js_executor.c:672 (unknown): ljtype 0
So by the time the longjmp type is written to R2, it's already been reset to zero. I'm wondering if it's those "side effects" mentioned on the DECREF calls...
You know, in hindsight I'm really surprised that the invalid longjmp didn't trigger an assert...
I did it! Here's a minimal test case to reproduce the bug:
function Food() {
this.getEaten = function() { print("The pig ate all the food, and then..."); };
}
Duktape.fin(Food.prototype, function() {})
new Food().getEaten();
try { throw new Error("A pig ate everyone at 8:12"); }
finally { print("*munch*"); }
It appears to be an interaction between the try-catch handler and finalization. That TitleScreen class doesn't have a finalizer, but it creates native objects which do.
Awesome, now I'll be able to work on this much more easily :-)
The side effects won't write to value stack as such; what "side effects" mean there is that an arbitrary amount of other code may run, which may cause reallocation of value stack, catch stack, call stack, etc, so any pointers may become invalid.
But in this case I think what happens is that the longjmp type gets changed because a DECREF triggers finalization, and the call handling related to finalization will overwrite the (global) longjmp state. I'll check all the longjmp related DECREF sites; probably this isn't an issue elsewhere. If so, the simple fix is to make a local copy of the longjmp type before operating with side effects.
Here's the minimal fix branch ('ll go through the other call sites before merging):
And here's a snapshot if you want to try it out:
It seems to fix the regression test case anyway :)
The
finally
runs, but leads to "internal error in bytecode executor longjmp handler". What's odd is that I can't always reproduce it. Depending on the surrounding lines, it sometimes works correctly.