mapbox / node-cpp-skel

Skeleton for bindings to C++ libraries for Node.js using node-addon-api
Creative Commons Zero v1.0 Universal
72 stars 10 forks source link

[WIP] Add benchmark scaffolding #29

Closed GretaCB closed 6 years ago

GretaCB commented 8 years ago

Per https://github.com/mapbox/node-cpp-skel/issues/25

Per chat with @springmeyer:

cc @mapsam @springmeyer

codecov-io commented 8 years ago

Codecov Report

Merging #29 into master will decrease coverage by 0.43%. The diff coverage is 98.16%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
- Coverage   98.82%   98.38%   -0.44%     
==========================================
  Files           2        2              
  Lines          85      186     +101     
==========================================
+ Hits           84      183      +99     
- Misses          1        3       +2
Impacted Files Coverage Δ
src/hello_world.hpp 0% <ø> (ø) :arrow_up:
src/hello_world.cpp 98.91% <98.16%> (-1.09%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 59d0092...4970302. Read the comment docs.

springmeyer commented 8 years ago

@GretaCB https://github.com/mapbox/node-cpp-skel/pull/29/commits/cf913b8adde27d14531644ce4d9b8961b408ffc9 adds a very expensive usage of std::map (because it invokes lots of memory allocation internally in the map, searching of the map, and string comparisons). Now with:

~/projects/node-cpp-skel[bench]$ time node test/bench/bench-batch.js  --iterations 10 --concurrency 10

real    0m3.491s
user    0m11.704s
sys 0m0.535s

I get my CPU usage spiking to > 500%. Running node test/bench/bench-batch.js --iterations 100 --concurrency 10 to keep it going long enough to easily attach in activity monitor gives a callstack 98% idle in the main event loop (as expected if the threads are doing all the work) and with 99.9-100% of the threads reporting busy doing work (:tada:):

screen shot 2016-10-18 at 4 02 53 pm
GretaCB commented 8 years ago

Currently working on adding a couple more benchmark scenarios before merging.

springmeyer commented 8 years ago

per chat with @GretaCB - next I'm going to take a look at profiling and tuning a few things in the PR. In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.

springmeyer commented 8 years ago

In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.

Done in https://github.com/mapbox/node-cpp-skel/commit/fb1fca8ee7ddfe6660fb6147dfa8f1bb955500a5. Now the contentiousThreads demo is properly awful. It can be tested like:

node test/bench/bench-batch.js --iterations 50 --concurrency 10 --mode contentiousThreads
Benchmark speed: 15 runs/s (runs:50 ms:3245 )
Benchmark iterations: 50 concurrency: 10 mode: contentiousThreads

If you bump up --iterations to 500 and profile in Activity Monitor.app you'll see the main loop is idle. This is expected because it is only dispatching work to the threads. The threads however are all "majority busy" in psynch_mutexwait (waiting for a locked mutex) as more time is spent waiting than doing the expensive work. This is because one thread will grab a lock, do work, all others will wait, another will grab the released lock, do work, all other threads will wait. This is all too common and the reason you don't want to use mutex locks. This is the profiling output of this non-ideal situation:

screen shot 2016-11-03 at 5 43 27 pm

When locks are unavoidable in real-world applications, we would hope that the % of time spent in psynch_mutexwait would be very small rather than very big. The real-world optimization would be to either rewrite the code to avoid needing locks or at least to rewrite the code to hold onto a lock for less time (scope the lock more).

springmeyer commented 7 years ago

Just looked back at this branch. It's got some great stuff that I think we should merge soon and keep iterating on. My one reservation before merging I'm slightly uncomfortable with how we are mixing best practice/simple/hello-world style code with new code demonstrating both advanced and non-ideal scenarios. I think we should split things apart before merging such that:

springmeyer commented 7 years ago

@GretaCB I just returned here to reflect on next steps. I feel like a good approach would be to split this work into 2 phases:

phase 1

Land a first PR that:

^^ this gets the key structure in place to make it easy and fast to start benchmarks for any module that uses skel. This is a great first step.

phase 2

We could revisit adding examples of performance scenarios. However I feel like this is a really advanced topic most suitable outside of node-cpp-skel. The skel is complex enough currently without diving deep on performance. But given performance is critical it would be great to cover it and benefit from the skel structure. So, here is an idea. Instead of building into the skel directly, we could:

springmeyer commented 7 years ago

@GretaCB now that Phase 1 is done in #61 - how about closing this ticket? Phase 2 remains but I feel like my idea is not concrete enough to warrant a ticket. I'm feeling really good with what we have and don't see a major need to ticket more work here. Rather we'll apply node-cpp-skel, learn perf issues, and then - at that time - have ideas of things to build back, or add to the docs.

springmeyer commented 6 years ago

Closing this. What we have are: