GretaCB commented 8 years ago

Per https://github.com/mapbox/node-cpp-skel/issues/25

[x] Add single run test
[x] Add batch run test
[x] Try out process.memoryUsage(). Thinking about what info is useful here
[x] Add sleep option for mocking when threads are busy, but aren't doing much work
[x] Add more heavy-duty benchmarks per https://github.com/mapbox/node-cpp-skel/issues/30#issuecomment-253306888

Per chat with @springmeyer:

Can we write a test that proves memory optimization?
node module-ify @springmeyer 's libnew lib for tracking of memory allocations
Document how to profile (using Activity Monitor). The skel isn't doing heavy operations, but might be possible to profile during batch bench run.

cc @mapsam @springmeyer

codecov-io commented 8 years ago

Codecov Report

Merging #29 into master will decrease coverage by 0.43%. The diff coverage is 98.16%.

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
- Coverage   98.82%   98.38%   -0.44%     
==========================================
  Files           2        2              
  Lines          85      186     +101     
==========================================
+ Hits           84      183      +99     
- Misses          1        3       +2

Impacted Files	Coverage Δ
src/hello_world.hpp	`0% <ø> (ø)`	:arrow_up:
src/hello_world.cpp	`98.91% <98.16%> (-1.09%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 59d0092...4970302. Read the comment docs.

springmeyer commented 8 years ago

@GretaCB https://github.com/mapbox/node-cpp-skel/pull/29/commits/cf913b8adde27d14531644ce4d9b8961b408ffc9 adds a very expensive usage of std::map (because it invokes lots of memory allocation internally in the map, searching of the map, and string comparisons). Now with:

~/projects/node-cpp-skel[bench]$ time node test/bench/bench-batch.js  --iterations 10 --concurrency 10

real    0m3.491s
user    0m11.704s
sys 0m0.535s

I get my CPU usage spiking to > 500%. Running node test/bench/bench-batch.js --iterations 100 --concurrency 10 to keep it going long enough to easily attach in activity monitor gives a callstack 98% idle in the main event loop (as expected if the threads are doing all the work) and with 99.9-100% of the threads reporting busy doing work (:tada:):

GretaCB commented 8 years ago

Currently working on adding a couple more benchmark scenarios before merging.

[ ] Add a benchmark to demonstrate the cost of interacting with libuv and the threadpool. Demonstrate when not to use async functions, in the case that the function's work is faster than libuv's ability to interact with the threadpool.
[x] Add a bench scenario where the code running inside the threadpool locks a mutex. This certainly happens in node-mbgl and node-mapnik. And will be a situation where all threads are full with work, it will not be CPU intensive, and they will be really slow (assuming lock contention is happening). We can create lock contention by having each thread attempt to access a global lock. Perf will be horrible
[ ] Bump up coverage
[ ] Update API docs and benchmark docs

springmeyer commented 8 years ago

per chat with @GretaCB - next I'm going to take a look at profiling and tuning a few things in the PR. In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.

springmeyer commented 8 years ago

In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.

Done in https://github.com/mapbox/node-cpp-skel/commit/fb1fca8ee7ddfe6660fb6147dfa8f1bb955500a5. Now the contentiousThreads demo is properly awful. It can be tested like:

node test/bench/bench-batch.js --iterations 50 --concurrency 10 --mode contentiousThreads
Benchmark speed: 15 runs/s (runs:50 ms:3245 )
Benchmark iterations: 50 concurrency: 10 mode: contentiousThreads

If you bump up --iterations to 500 and profile in Activity Monitor.app you'll see the main loop is idle. This is expected because it is only dispatching work to the threads. The threads however are all "majority busy" in psynch_mutexwait (waiting for a locked mutex) as more time is spent waiting than doing the expensive work. This is because one thread will grab a lock, do work, all others will wait, another will grab the released lock, do work, all other threads will wait. This is all too common and the reason you don't want to use mutex locks. This is the profiling output of this non-ideal situation:

screen shot 2016-11-03 at 5 43 27 pm

When locks are unavoidable in real-world applications, we would hope that the % of time spent in psynch_mutexwait would be very small rather than very big. The real-world optimization would be to either rewrite the code to avoid needing locks or at least to rewrite the code to hold onto a lock for less time (scope the lock more).

springmeyer commented 7 years ago

Just looked back at this branch. It's got some great stuff that I think we should merge soon and keep iterating on. My one reservation before merging I'm slightly uncomfortable with how we are mixing best practice/simple/hello-world style code with new code demonstrating both advanced and non-ideal scenarios. I think we should split things apart before merging such that:

src/hello_world.hpp and src/hello_world.cpp contain only simple hello world style code: only binding a single standalone function (this is what you would copy if you just want to start writing a module fast and use the skel as a base)
src/async_<name>.hpp and src/async_<name>.cpp and src/async_<name>.readme.md that demonstrate common async scenarios in code and have a readme talking about what about them to care about them (good and bad). (This is what you'd profile, and poke at within skel, to learn about the nuances of node async performance).

springmeyer commented 7 years ago

@GretaCB I just returned here to reflect on next steps. I feel like a good approach would be to split this work into 2 phases:

phase 1

Land a first PR that:

[x] Adds a ./bench folder
[x] Adds two benchmark scripts to cover the two performance focused functions currently in master:
- [x] ./bench/hello_async.bench.js
- [x] ./bench/hello_object_async.test.js
[x] These scripts could be based on a simplified version of the test/bench/bench-batch.js currently in this PR
[x] Very simple docs added to the README on how to run the benchmark and a comment that it is not realworld yet since the code does not do much, but that the idea is for developers using skel to adapt it and run it to monitor the performance of the code they add.

^^ this gets the key structure in place to make it easy and fast to start benchmarks for any module that uses skel. This is a great first step.

phase 2

We could revisit adding examples of performance scenarios. However I feel like this is a really advanced topic most suitable outside of node-cpp-skel. The skel is complex enough currently without diving deep on performance. But given performance is critical it would be great to cover it and benefit from the skel structure. So, here is an idea. Instead of building into the skel directly, we could:

Fork skel, and remove lots of unneeded code and docs (like the sync examples)
Add back key async performance scenarios
Add docs for this
Then link to this fork as an example of a skel implementation
The fork would be a separate "performance deep dive example using node-cpp-skel"

springmeyer commented 7 years ago

@GretaCB now that Phase 1 is done in #61 - how about closing this ticket? Phase 2 remains but I feel like my idea is not concrete enough to warrant a ticket. I'm feeling really good with what we have and don't see a major need to ticket more work here. Rather we'll apply node-cpp-skel, learn perf issues, and then - at that time - have ideas of things to build back, or add to the docs.

springmeyer commented 6 years ago

Closing this. What we have are:

Benchmark scripts: https://github.com/mapbox/node-cpp-skel/tree/master/bench
Benchmarking docs: https://github.com/mapbox/node-cpp-skel/blob/master/docs/benchmarking.md
Future performance questions to answer like #143

mapbox / node-cpp-skel

[WIP] Add benchmark scaffolding #29

Codecov Report

phase 1

phase 2