Closed Lupus closed 5 years ago
Array.unsafe_get
should not be expensive.
Can you describe how you've built the JavaScript file ?
There exists two compilation modes:
I've tried with the below dune file:
(executable
(name node_sample)
(public_name node_sample)
(flags (:standard -g))
(libraries core_kernel)
(preprocess (pps lwt_ppx))
(js_of_ocaml
(javascript_files ../js_runtime/bigstring_node.js)
(flags (:standard
+gen_js_api/ojs_runtime.js
--pretty
--no-inline
--debug-info
--source-map
--disable shortvar)))
(modes byte))
Also I've tried changing all of the options to just --opt 3
but did not see much difference.
Looking more at the profile, it seems that dequeue_exn
is being run in large recursive chain of calls, may the that is the culprit of performance issues, AFAIK Node.js is not doing great at recursion optimizations.
Oh, and magic bigstring_node.js overrides caml_ba_init_views
so that Node's Buffer
is used instead of Uint8Array
. Also it includes helpers to convert between Bigstring.t
and Node's Buffer
objects. In my understanding that should eliminate some memory copies when casting between Uint8Array
and Buffer
.
In my understanding that should eliminate some memory copies when casting between Uint8Array and Buffer.
FYI According to quick test, on this workload using Buffer
instead of Uint8Array
version gives 4% better performance.
I've tried with the below dune file:
That doesn't give enough information about whether or not you use separate compilation.
I think dune will use separate compilation when using the dev
profile (which is the default)
I think dune will use separate compilation when using the dev profile (which is the default)
Dune's documentation agrees with this statement.
I've tried building with the following command:
dune build node/node_sample.bc.js --profile release -j 1 --verbose
Full build log is attached in case it can help reveal some details:
Building this way does not show any significat changes in performance compared to --profile development
(2309.61 rps in release vs 2267.17 rps in dev)
I'm also attaching build log from second build command I tried:
dune build node/node_sample.bc.js --profile development -j 1 --verbose
How can I further troubleshoot this?
I've updated node from 8 to 10 gave 30% improvement to perf, and another 18% from 10 to 12.
Profiler is still unhappy with dequeue_exn
function. In chart mode the whole "tower" of request processing ends up in multiple dequeue_exn
calls.
Newer node attributes all of this time to whole dequeue_exn
function now in source code view:
Looking at the js code for this function, there's nothing particularly interesting inside, aside of exception, which is probably the suspect for low performance...
function dequeue_exn(t)
{ /*<<lib/faraday.ml 98 7>>*/ if
( /*<<lib/faraday.ml 98 7>>*/ is_empty(t))
throw Dequeue_empty;
/*<<lib/faraday.ml 101 6>>*/ /*<<lib/faraday.ml 101 6>>*/ var
result=
t[1][t[2] + 1];
/*<<lib/faraday.ml 103 6>>*/ t[1][t[2] + 1] = sentinel;
/*<<lib/faraday.ml 103 6>>*/ t[2] = t[2] + 1 | 0;
/*<<lib/faraday.ml 103 6>>*/ return result /*<<lib/faraday.ml 104 12>>*/
/cc @seliopou
Changes in above pull requests to Httpaf and Faraday remove functions in question from top of profiler.
Performance increase in my test is not that impressive though, +5%.
I'll dig this further while this issue can be closed. Thanks @hhugo for your help!
Hey guys,
We're building some internal framework that's going to be based on Httpaf (/cc @seliopou), and I'm trying to make this framework run in Node.js environment. So far my approach is to integrate vanilla Httpaf using it's public API on top of Node.js socket (using gen_js_api for writing Node bindings). I've got it working, js_of_ocaml is really awesome! But I ran into some performance issue, simple "hello world" style HTTP handler can only do 5k request per second on my core i7 laptop. Analogous pure Node.js server shows 12k rps (I understand that js_of_ocaml version with pure-OCaml HTTP processing can't compete with optimized node parser, but <50% of original performance looks way too much for me).
I've tried profiling my app, results in this screenshot:
First item corresponds to this source location within Faraday library:
Third line has less time spent in it:
Is there something particular in that
Array.unsafe_get
that makes it so expensive?