Closed svaarala closed 9 years ago
After #265 there seems to be a spurious compiler warning:
duk_js_call.c: In function ‘duk_handle_call’:
duk_js_call.c:835:15: warning: variable ‘entry_thr_curr_pc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
See if that can be fixed with this change too.
Quick test using a dispatch heavy test (LDCONST micro benchmark):
# Local curr_pc
$ sudo nice -20 python util/time_multi.py --count 10 --sleep 10 --mode all --verbose ./duk.O2.localpc tests/perf/test-const-load.js
Running: 5.430000 5.410000 5.780000 5.370000 5.480000 5.560000 5.450000 5.380000 5.390000 5.370000
min=5.37, max=5.78, avg=5.46, count=10: [5.43, 5.41, 5.78, 5.37, 5.48, 5.56, 5.45, 5.38, 5.39, 5.37]
# Master, including thr->curr_pc dispatch and local consts base pointer
$ sudo nice -20 python util/time_multi.py --count 10 --sleep 10 --mode all --verbose ./duk.O2.master tests/perf/test-const-load.js
Running: 6.430000 6.450000 6.440000 6.440000 6.420000 6.450000 6.410000 6.420000 7.190000 7.220000
min=6.41, max=7.22, avg=6.59, count=10: [6.43, 6.45, 6.44, 6.44, 6.42, 6.45, 6.41, 6.42, 7.19, 7.22]
Practical difference will be less, depending on how dispatch heavy the executed bytecode is.
By "dispatch heavy" I assume you mean, e.g. tight loops?
Anything that does a lot of opcode dispatch compared to other execution - if a tight loop involves a regexp match for instance, the regexp performance will of course dominate.
I'll polish and merge this branch tomorrow. Performance work for 1.3 will then be finished, yay :-)
C declarations are weird:
cdecl> explain int * volatile * volatile curr_pc
declare curr_pc as volatile pointer to volatile pointer to int
I guess that's what happens when you're about half a step away from assembly. ;-)
Quick performance test between master (thr->curr_pc and local curr_pc):
test-array-read.js : duk-O2 4.00 duk-O2-master 4.31
test-array-write.js : duk-O2 4.39 duk-O2-master 4.75
test-bitwise-ops.js : duk-O2 0.87 duk-O2-master 0.94
test-buffer-nodejs-read.js : duk-O2 4.41 duk-O2-master 4.84
test-buffer-nodejs-write.js : duk-O2 5.56 duk-O2-master 6.15
test-buffer-object-read.js : duk-O2 4.62 duk-O2-master 4.75
test-buffer-object-write.js : duk-O2 5.78 duk-O2-master 5.94
test-buffer-plain-read.js : duk-O2 3.97 duk-O2-master 4.50
test-buffer-plain-write.js : duk-O2 3.64 duk-O2-master 4.17
test-call-basic.js : duk-O2 15.08 duk-O2-master 16.04
test-compile-mandel-nofrac.js : duk-O2 13.90 duk-O2-master 13.59
test-compile-mandel.js : duk-O2 17.27 duk-O2-master 17.01
test-compile-short.js : duk-O2 10.42 duk-O2-master 10.84
test-const-load.js : duk-O2 6.06 duk-O2-master 7.40
test-empty-loop.js : duk-O2 2.18 duk-O2-master 2.33
test-fib.js : duk-O2 8.52 duk-O2-master 9.70
test-global-lookup.js : duk-O2 11.66 duk-O2-master 10.42
test-hello-world.js : duk-O2 0.00 duk-O2-master 0.00
test-hex-decode.js : duk-O2 5.32 duk-O2-master 5.33
test-hex-encode.js : duk-O2 26.94 duk-O2-master 27.30
test-json-integer-parse.js : duk-O2 0.40 duk-O2-master 0.40
test-json-number-parse.js : duk-O2 5.07 duk-O2-master 5.15
test-json-serialize.js : duk-O2 2.97 duk-O2-master 3.00
test-json-string-bench.js : duk-O2 5.59 duk-O2-master 5.50
test-json-string-parse.js : duk-O2 0.59 duk-O2-master 0.59
test-json-string-stringify.js : duk-O2 1.27 duk-O2-master 1.27
test-prop-read.js : duk-O2 8.01 duk-O2-master 7.96
test-prop-write.js : duk-O2 7.18 duk-O2-master 7.17
test-reg-load.js : duk-O2 2.76 duk-O2-master 3.12
test-reg-readwrite-object.js : duk-O2 3.73 duk-O2-master 3.82
test-reg-readwrite-plain.js : duk-O2 2.51 duk-O2-master 2.84
test-regexp-string-parse.js : duk-O2 10.55 duk-O2-master 10.99
test-string-array-concat.js : duk-O2 6.41 duk-O2-master 7.12
test-string-compare.js : duk-O2 4.16 duk-O2-master 4.12
test-string-decodeuri.js : duk-O2 3.67 duk-O2-master 3.63
test-string-encodeuri.js : duk-O2 4.30 duk-O2-master 4.53
test-string-intern-match.js : duk-O2 2.35 duk-O2-master 2.37
test-string-intern-miss.js : duk-O2 2.54 duk-O2-master 2.56
test-string-plain-concat.js : duk-O2 4.11 duk-O2-master 4.20
test-string-uppercase.js : duk-O2 2.60 duk-O2-master 2.73
This is now in master and is the final performance change for 1.3. There's potential for very weird fallout if something doesn't work right, so keep your eyes peeled :-)
I'll pull the latest master and try it out with minisphere. If there are any issues I'll let you know!
Just got around to looking at the recent changes. I must say this was quite clever:
duk_instr_t **ptr_curr_pc;
The duk_hthread
holds a backpointer to the executor's local PC to avoid an extra level of syncing (which is what I thought this change was going to entail).
Only one thing worries me about this setup, and that's how it will scale to recursive executor calls (as would happen in the case of, e.g. JS -> C -> JS.
Well, no weird issues showed up in my (admittedly not very thorough) testing. So it looks like this bit of black magic worked. ;-)
Each time the bytecode executor is called recursively the call handling will store and restore ptr_curr_pc
so it should not be an issue. That's not to say I'm not a bit uncomfortable with "syncing back", but opcode dispatch is so central to performance that it seems justified.
I may have found a regression with this change, I have to do a bit more testing to see if I can reproduce it.
What was the outward effect?
The one thing I did notice was, for my pause-on-error change, when stopped at an error GetCallStack would sometimes report the wrong line number and PC (the beginning of a function instead of the correct place).
Things like that will be incorrect when a "sync" is missing (copying the local curr_pc to act->pc
). I think I'll find this one simply by code reading.
But was this the regression you were referring to or was it something else?
I think there's at least a sync missing in the case where the bytecode executor triggers a side effect like a getter call. It won't cause any harm other than the act->pc
being incorrect so tracebacks etc will look wrong.
I'll confirm and fix that shortly.
The specific scenario where I noticed this was where a Duktape/C function throws an error and the debugger pauses before longjmp, the reported PC for the ecma function directly below is incorrect
I can replicate this behavior by using a breakpoint in e.g. a getter function and resuming until the breakpoint is hit. It can't be reproduced by stepping because that will continuously sync the PC. That's why I missed this in my basic debugger test before merge.
Fix and bug testcase coming shortly. Thanks!
What's weird is that the "enter debugger now" helper I added for #286 already includes a sync before the interrupt, so I'm not actually sure why this happens.
The missing sync was for a non-ecma-to-ecma call, so that the PC of the calling activation would be outdated. The sync you added is for the current activation?
Ah, yes, that makes sense. :)
Ok, fix (and bug test) is in master, here's a snapshot: http://duktape.org/snapshots/duktape-1.2.99-20150826172312-v1.2.0-401-g8b5bcca.tar.xz
I have to go to work right now, but I'll test out the snapshot tonight and let you know.
Sure, no problem - I think the fix should work for at least that GetCallStack issue, but it's of course best if you can verify it.
The snapshot fixed the issue. If you're curious, the specific issue I discovered turned out to have nothing to do with ecmascript-to-native calls, the bug was in new
handling:
// line 1
(function() {
print("The 8:12 pig was here!");
function f() { print(Duktape.act(-3).lineNumber); }
new f(); // prints 3, should print 5
})();
But whatever you did appears to have fixed this case too.
It was actually related: the fix was in handle_call()
and new
uses handle_call()
even when calling an Ecmascript constructor.
Just triggered an assert that might be related to this (without assertions, this ends in a segfault):
PANIC 54: assertion failed: curr_pc < DUK_HCOMPILEDFUNCTION_GET_CODE_END(thr->heap, fun) (duk_js_executor.c:2266) (calling abort)
The value of curr_pc at the time according to MSVC:
curr_pc = 0x000000dfab127cb8 {4261281277}
This value is consistent from run to run (which is odd, since it's a pointer).
Any idea what's going on when this happens?
Well, here's my native callstack for starters:
engine64.exe!issue_debug_notification(const wchar_t * const message) Line 125 C++
engine64.exe!__acrt_report_runtime_error(const wchar_t * message) Line 142 C++
engine64.exe!abort() Line 61 C++
engine64.exe!duk_default_panic_handler(int code, const char * msg) Line 127 C
> engine64.exe!duk_js_execute_bytecode(duk_hthread * exec_thr) Line 2266 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1496 C
engine64.exe!duk_call(duk_hthread * ctx, int nargs) Line 61 C
engine64.exe!run_script(script * script, bool allow_reentry) Line 189 C
engine64.exe!render_map() Line 1284 C
engine64.exe!js_RenderMap(duk_hthread * ctx) Line 2856 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1383 C
engine64.exe!duk_js_execute_bytecode(duk_hthread * exec_thr) Line 3408 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1496 C
engine64.exe!duk_call_method(duk_hthread * ctx, int nargs) Line 84 C
engine64.exe!duk_bi_function_prototype_apply(duk_hthread * ctx) Line 206 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1383 C
engine64.exe!duk_js_execute_bytecode(duk_hthread * exec_thr) Line 3408 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1496 C
engine64.exe!duk_call(duk_hthread * ctx, int nargs) Line 61 C
engine64.exe!run_script(script * script, bool allow_reentry) Line 189 C
engine64.exe!update_map_engine(bool is_main_loop) Line 1397 C
engine64.exe!js_MapEngine(duk_hthread * ctx) Line 1546 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1383 C
engine64.exe!duk_js_execute_bytecode(duk_hthread * exec_thr) Line 3408 C
engine64.exe!duk_handle_call(duk_hthread * thr, int num_stack_args, unsigned int call_flags) Line 1496 C
engine64.exe!duk_pcall(duk_hthread * ctx, int nargs) Line 138 C
engine64.exe!main(int argc, char * * argv) Line 337 C
[External Code]
Hm, I wonder if I could use Duktape's JS debugger to narrow things down? There are a few levels of native calls in there though that might make it more difficult, though... and that's assuming single-stepping doesn't mask the bug.
The most likely reason is that some PC sync operation uses an out-of-date thr->ptr_curr_pc somewhere. Just to narrow things down, do you use coroutines?
I think I'll make a small change where thr->ptr_curr_pc is set to NULL whenever any activation states change: otherwise a PC sync operation would copy the previous curr_pc value into another activation.
No coroutines, no. But some weird corruption is definitely going on here.
Specifically, this comes up when starting the Spectacles battle engine (that being the game I wrote minisphere for). The first battle I tested lead to the segfault and failed assert mentioned above. I tried a different battle, and I'm SO glad I implemented pause-on-throw now... Duktape threw a nonsensical "invalid index" error (valid script filename but line number of 0), allowing me to view the Ecmascript callstack (which is quite deep!) and local variables.
With that information, I found the function responsible for the segfault:
// .render() method
// Renders the BattleActor in its current state.
BattleActor.prototype.render = function()
{
if (!this.isVisible && this.damages.length == 0 && this.healings.length == 0) {
return;
}
this.sprite.blit(this.x, this.y, this.opacity * 255);
for (var i = 0; i < this.damages.length; ++i) {
var text = this.damages[i].text;
var x = this.x + 16 - this.messageFont.getStringWidth(text) / 2;
for (var i2 = 0; i2 < text.length; ++i2) {
var yName = 'y' + i2.toString();
var y = this.y + this.damages[i][yName];
var color = this.damages[i].color !== null ? this.damages[i].color
: CreateColor(255, 255, 255, 255);
DrawTextEx(this.messageFont, x, y, text[i2], color, 1);
x += this.messageFont.getStringWidth(text[i2]);
}
}
for (var i = 0; i < this.healings.length; ++i) {
var y = this.y + this.healings[i].y;
var color = this.healings[i].color !== null ? this.healings[i].color : CreateColor(64, 255, 128, 255);
var textColor = BlendColors(color, color);
textColor.alpha *= this.healings[i].alpha / 255;
DrawTextEx(this.messageFont, this.x + 16, y, this.healings[i].amount, textColor, 1, 'center');
}
};
And the line which called it:
this.actors[type][i].render();
Worth noting is that the local variables (including the loop counter) are all listed as being set to undefined
. Which means the render()
function probably didn't even start executing yet.
Forgive me if I seem a little overexcited here; I probably enjoy debugging more than would generally be considered healthy. :stuck_out_tongue_closed_eyes:
:-)
What I don't understand is why the value of the corrupt curr_pc
isn't changing from run to run. If it were using an outdated value (from a previous activation, say), the actual value should still change as memory allocation isn't exactly known for being deterministic, even with virtual addressing. It looks more to me like curr_pc
is being clobbered.
This debug log just before the assert error might help shed some light:
DUK_D(DUK_DPRINT("curr_pc=%p base=%p end=%p &curr_pc=%p thr->ptr_curr_pc=%p",
(void *) curr_pc,
(void *) DUK_HCOMPILEDFUNCTION_GET_CODE_BASE(thr->heap, fun),
(void *) DUK_HCOMPILEDFUNCTION_GET_CODE_END(thr->heap, fun),
(void *) &curr_pc,
(void *) thr->ptr_curr_pc));
Okay, I'll add that debug print now. By the way, the error is reproducible in a 32-bit x86 build as well, and will wonders never cease: curr_pc = 0x0671d87c {4261281277}
Still the same! Of course I realize now that the value in braces is the result of dereferencing the pointer, but still intriguing that it's the same value.
I'm trying to figure out what your game code does differently from the testcases: all testcases pass with assertions enabled. One difference is that your calling C side is obviously more complex than the "duk" command line tool, and that there are more Duktape/C functions involved.
I've added setting ptr_curr_pc to NULL whenever activations are changed (when doing call setup and when throwing errors). This is so that while the pointer itself may be valid, it's no longer related to the topmost activation's curr_pc pointer so syncing would be incorrect. I'm not sure if this is what's happening in your case; but here's a snapshot with the change: http://duktape.org/snapshots/duktape-1.2.99-20150827100951-v1.2.0-403-gf0535a3.tar.xz
That decimal value in hex is 0xfdfdfdfd
by the way.
Lovely, that debug print generates so much output that the game never gets started because the console is bogged down printing stuff.
Got it, had to be clever with a conditional:
[D] duk_js_executor.c:2271 (unknown): curr_pc=00000094B45DE5B8 base=00000094B5003298 end=00000094B50035BC &curr_pc=00000094B13E9F58 thr->ptr_curr_pc=00000094B13E9F58
As for that 0xfdfdfdfd value, I found this bit of info: "0xFD Fence Memory Also known as "no mans land." This is used to wrap the allocated memory (like surrounding it with fences) and is used to detect indexing arrays out of bounds."
Interesting.
Unfortunately it's 3:30am and I need some sleep, so further bug-hunting will have to wait. :)
Looking at that debug print though, it looks like the PC someone ended up just outside the bytecode. Which is consistent with the "fence memory" thing.
No worries, I'll try to figure this out. Thanks for the help!
I think I'll add a thr->idx_curr_act
to accompany thr->ptr_curr_pc
which identifies the activation the ptr_curr_pc is related to. This makes syncing always correct even when there are activation changes, so it'll be much easier to get the syncing correct.
With #265 bytecode dispatch uses a cached
thr->curr_pc
pointer which is copied back into the topmost activationact->curr_pc
whenever activation states may change or an up-to-date activation state is needed (e.g. for line numbers in the debugger).This can probably be improved a bit further:
curr_pc
. Make that variable volatile so that it is always stored back into the stack frame after a dispatch.curr_pc
(i.e. aduk_instr_t **
with&curr_pc
as its value) inthr->ptr_curr_pc
.thr->curr_pc
was copied back intoact->curr_pc
after #265, copy the value from*thr->ptr_curr_pc
instead.This is faster because
thr
is sometimes spilled but the stack frame pointer is not.Preliminary testing suggests this is roughly 5% faster than #265 (to be verified with more careful testing).
Changes:
thr->ptr_curr_pc
: bytecode executor can establish it, and restore it before exiting (it always has a setjmp catchpoint so it can do this reliably)