Closed GretaCB closed 6 years ago
Merging #29 into master will decrease coverage by
0.43%
. The diff coverage is98.16%
.
@@ Coverage Diff @@
## master #29 +/- ##
==========================================
- Coverage 98.82% 98.38% -0.44%
==========================================
Files 2 2
Lines 85 186 +101
==========================================
+ Hits 84 183 +99
- Misses 1 3 +2
Impacted Files | Coverage Δ | |
---|---|---|
src/hello_world.hpp | 0% <ø> (ø) |
:arrow_up: |
src/hello_world.cpp | 98.91% <98.16%> (-1.09%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 59d0092...4970302. Read the comment docs.
@GretaCB https://github.com/mapbox/node-cpp-skel/pull/29/commits/cf913b8adde27d14531644ce4d9b8961b408ffc9 adds a very expensive usage of std::map
(because it invokes lots of memory allocation internally in the map, searching of the map, and string comparisons). Now with:
~/projects/node-cpp-skel[bench]$ time node test/bench/bench-batch.js --iterations 10 --concurrency 10
real 0m3.491s
user 0m11.704s
sys 0m0.535s
I get my CPU usage spiking to > 500%. Running node test/bench/bench-batch.js --iterations 100 --concurrency 10
to keep it going long enough to easily attach in activity monitor gives a callstack 98% idle in the main event loop (as expected if the threads are doing all the work) and with 99.9-100% of the threads reporting busy doing work (:tada:):
Currently working on adding a couple more benchmark scenarios before merging.
per chat with @GretaCB - next I'm going to take a look at profiling and tuning a few things in the PR. In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.
In particular I'll look at the impl of the mutex lock and make sure there is enough work being done in the async function that locks the global such that we are properly demonstrating (e.g. providing an example you can profile) the kind of thread contention programmers should avoid.
Done in https://github.com/mapbox/node-cpp-skel/commit/fb1fca8ee7ddfe6660fb6147dfa8f1bb955500a5. Now the contentiousThreads
demo is properly awful. It can be tested like:
node test/bench/bench-batch.js --iterations 50 --concurrency 10 --mode contentiousThreads
Benchmark speed: 15 runs/s (runs:50 ms:3245 )
Benchmark iterations: 50 concurrency: 10 mode: contentiousThreads
If you bump up --iterations
to 500 and profile in Activity Monitor.app you'll see the main loop is idle. This is expected because it is only dispatching work to the threads. The threads however are all "majority busy" in psynch_mutexwait
(waiting for a locked mutex) as more time is spent waiting than doing the expensive work. This is because one thread will grab a lock, do work, all others will wait, another will grab the released lock, do work, all other threads will wait. This is all too common and the reason you don't want to use mutex
locks. This is the profiling output of this non-ideal situation:
When locks are unavoidable in real-world applications, we would hope that the % of time spent in psynch_mutexwait
would be very small rather than very big. The real-world optimization would be to either rewrite the code to avoid needing locks or at least to rewrite the code to hold onto a lock for less time (scope the lock more).
Just looked back at this branch. It's got some great stuff that I think we should merge soon and keep iterating on. My one reservation before merging I'm slightly uncomfortable with how we are mixing best practice/simple/hello-world style code with new code demonstrating both advanced and non-ideal scenarios. I think we should split things apart before merging such that:
src/hello_world.hpp
and src/hello_world.cpp
contain only simple hello world style code: only binding a single standalone function (this is what you would copy if you just want to start writing a module fast and use the skel as a base)src/async_<name>.hpp
and src/async_<name>.cpp
and src/async_<name>.readme.md
that demonstrate common async scenarios in code and have a readme talking about what about them to care about them (good and bad). (This is what you'd profile, and poke at within skel, to learn about the nuances of node async performance).@GretaCB I just returned here to reflect on next steps. I feel like a good approach would be to split this work into 2 phases:
Land a first PR that:
./bench
folder./bench/hello_async.bench.js
./bench/hello_object_async.test.js
test/bench/bench-batch.js
currently in this PR^^ this gets the key structure in place to make it easy and fast to start benchmarks for any module that uses skel. This is a great first step.
We could revisit adding examples of performance scenarios. However I feel like this is a really advanced topic most suitable outside of node-cpp-skel. The skel is complex enough currently without diving deep on performance. But given performance is critical it would be great to cover it and benefit from the skel structure. So, here is an idea. Instead of building into the skel directly, we could:
@GretaCB now that Phase 1 is done in #61 - how about closing this ticket? Phase 2 remains but I feel like my idea is not concrete enough to warrant a ticket. I'm feeling really good with what we have and don't see a major need to ticket more work here. Rather we'll apply node-cpp-skel, learn perf issues, and then - at that time - have ideas of things to build back, or add to the docs.
Closing this. What we have are:
Per https://github.com/mapbox/node-cpp-skel/issues/25
process.memoryUsage()
. Thinking about what info is useful heresleep
option for mocking when threads are busy, but aren't doing much workPer chat with @springmeyer:
cc @mapsam @springmeyer