Closed neophob closed 7 years ago
maybe spend in v8 compile javascript functions.
What are you comparing to? For example, an ARM device running off an SD card with limited memory isn't really comparable to a modern x86-64 processor with a traditional hard drive or SSD and a larger amount of memory.
Thanks @mscdex, sure the machine is a constraint device, but for example 4s just for th@e require of the mailgun-js package seems to be a bit too long. Looking at iostats does not look like the machine is waiting on any io, so this comment of @yjhjstz might be a good track.
as the require wait times are depend on the package itself I would like to find out why it is so slow. Currently my application needs more than 10s only for the require part.
I know that I cant compare those numbers with my mac book pro, but also there the require show some difference to require the packages.
131ms express
2ms file-type
180ms loggly
292ms mailgun-js
0ms mustache
0ms q
36ms request
29ms winston
So long story short:
require
time is <3ms, while for other packages the require
call needs up to 100 times more time? What does exactly need more time?require
call?Try profiling your applications like this:
$ perf record -e cpu-clock -gi node --perf_basic_prof script.js
$ perf report
That should tell you where CPU time is spent. My guess is that it's going to be V8's parser and code generator.
Experiment with -c <num>
with a suitably high number to perf if the default sample frequency is too coarse to capture meaningful data. You can try substituting or extending -e cpu-clock
with other events if your hardware has PMUs, see perf evlist
.
thanks @bnoordhuis - i did that, here are the top entries of the perf report
run:
+ 87.66% 0.14% node perf-16278.map [.] Builtin:ArgumentsAdaptorTrampoline ◆
+ 84.84% 0.03% node perf-16278.map [.] LazyCompile:Module._load module.js:271 ▒
+ 84.36% 0.00% node perf-16278.map [.] LazyCompile:~Module.load module.js:334 ▒
+ 83.89% 0.05% node perf-16278.map [.] LazyCompile:~Module._compile module.js:366 ▒
+ 83.78% 0.04% node perf-16278.map [.] Builtin:FunctionApply ▒
+ 81.71% 0.00% node perf-16278.map [.] LazyCompile:~require internal/module.js:11 ▒
+ 81.55% 0.03% node perf-16278.map [.] LazyCompile:~Module._extensions..js module.js:414 ▒
+ 77.11% 0.27% node perf-16278.map [.] Stub:CEntryStub ▒
+ 73.30% 0.00% node perf-16278.map [.] Function:~ node.js:10 ▒
+ 73.18% 0.02% node perf-16278.map [.] Builtin:JSEntryTrampoline ▒
+ 73.16% 0.00% node perf-16278.map [.] LazyCompile:~startup node.js:13 ▒
+ 72.77% 0.01% node perf-16278.map [.] Stub:JSEntryStub ▒
+ 72.29% 0.02% node nodejs [.] v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Ob▒
+ 71.79% 0.00% node nodejs [.] node::Start(int, char**) ▒
+ 71.46% 0.00% node nodejs [.] v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) ▒
+ 71.28% 0.00% node libc-2.19.so [.] __libc_start_main
the problem is, that packages with huge dependencies like request (see http://npm.anvaka.com/#/view/2d/request) are responsible for this huge loading time. What I find a bit fishy is that request should fetch each module/bundle only once. however in my example I do have request as dependency in my project, but why does node need so much time reloading the request module again when its used by another dependency? to illustrate my question:
# npm ls
├─┬ request@2.74.0 -> initial load of the request package, take some time
│ ├── aws-sign2@0.6.0
│ ├── aws4@1.4.1
<snip>
│ │ │ ├─┬ request@2.73.0 -> here request should return the cached module
│ │ │ │ ├── aws-sign2@0.6.0
│ │ │ │ ├── aws4@1.4.1
especially since npm3 flattened the modules. or do I see something wrong here?
Can we safely say that the disk io wait time is not only one to blame? Somebody knows how to reliably increase/decrease disk/cpu performance and observe the impact?
@molszanski yes disk io is not an issue, checked with iostat, no disk wait. so as @yjhjstz suspected, its the js compile functions that need time.
@neophob maybe you can try custom startup snapshots, refer to http://v8project.blogspot.de/2015/09/custom-startup-snapshots.html
but why does node need so much time reloading the request module again when its used by another dependency?
Your example has two different versions of request, 2.73.0 and 2.74.0.
Somebody knows how to reliably increase/decrease disk/cpu performance and observe the impact?
Do echo 3 > /proc/sys/vm/drop_caches
as root before running the benchmark, that will remove the flakiness of maybe/maybe not having the files cached in memory.
Just a quick recap
echo 3 > /proc/sys/vm/drop_caches
)request
package shows the effect very clearly (@bnoordhuis you was right, each request version was loaded). I assume this is because of the huge dependency tree (69 dependent modules are loaded)@yjhjstz your comment looks interesting, but I assume this feature is not accessible from node, right?
As the perf output shows most of the loading time is spend compiling the js code, are there any other options like a "load unoptimized code now and optimise later" option? or store the optimized code (this goes into the "custom startup snapshots" option)
are there any other options like a "load unoptimized code now and optimise later" option?
That is unoptimized code you're looking at. :-)
The optimizing compiler doesn't start up until the unoptimized code has been executing for some time (and most of its work it does on a separate thread - not a useful option for the baseline compiler.)
or store the optimized code (this goes into the "custom startup snapshots" option)
That's not really an option for most JS code. Snapshots run in an environment that is very different from the normal runtime.
The vm module has a produceCachedData
option that is less restrictive than the snapshotting mechanism (but less restrictions != no restrictions.) YMMV.
I think there is nothing left to do or discuss here so I'll go ahead and close the issue. Reopen if anyone disagrees.
I use to measure startup time of my application using https://github.com/henvic/require-time. The application runs on a armv7l hardware, no big difference if I use Node v4.4.7 or the latest v6. npm version 3.9.6.
Here are the results:
I not entirely sure if this is a NPM or a node issue. What I know is, that the issue is NOT related that node does search the files - as I measure the same long startup time when starting my application and using https://github.com/wix/fast-boot.
Here's an strace output when starting my application:
any hints?