Open mhdawson opened 9 years ago
Concerning benchmarking "real" apps it might be hard to get meaningful results that can be compared. If you're benchmarking an app there are many modules involved and external resources like database.
Think of benchmarking acmeair v0.1.1 using io.js 2.2.1 on day 1. A week later we benchmark acmeair v 0.1.1 using iojs 2.2.2 and the benchmark shows that it's way faster.
But we actually don't know why? Might be an updated module, or a database upgrade or io.js being faster.
I think benchmarking a real application is a good thing, but there should be a clear strategy on how to get reproducible results. Maybe shrinkwrapping modules versions and mocking databases could do the trick.
What do you think?
Agreed - for any benchmark it will be important to only change one "component" (be that node.js/io.js or used module) at a time, so we know where the change in performance is coming from. Shrinkwrapping is a good approach to making sure that happens.
+1 for seabaylea's comment. For comparison purposes we'll want to limit the changes so that we can isolate what may have affected results
Do you think database latency might be an issue? If we stick to the same machine(s) and database version it might not be a problem.
Like other variables we'd need to make sure we keep the database version/systems consistent between runs and when we do change the database not change anything else at the same time
Since WebSockets are often used with Node some that we might consider including:
https://www.npmjs.com/package/thor https://www.npmjs.com/package/websocket-benchmark https://www.npmjs.com/package/websocket-bench
Here are a couple gulp scripts that would put decent pressure on a macro-benchmark: https://github.com/gulpjs/gulp/issues/1118
docs/case_coverage.md
is somewhat sparse.
I am interested in helping to populate it.
I'd like use this comment to track recommended benchmarks for the various use cases. If you have a suggestion, please reply on this issue and I'll update this comment. The current use cases below are taken from #243.
Use case | Suggested benchmark(s) |
---|---|
Back-end API services | - ezPAARSE? See #76 though. - ? |
Service oriented architectures (SOA) | ? |
Microservice-based applications | - Node-DC-EIS in u-service mode, but see #78. - jasnell and mcollina suggested workloads that are (a) JSON parse/stringify heavy, or (b) use FS and DNS heavily |
Generating/serving dynamic web page content | - Acme Air - Node-DC-EIS (monolithic mode) - Node-DC-SSR (electrode) - ghost |
Single page applications (SPA) | etherpad-lite |
Use case | Suggested benchmark(s) |
---|---|
Scripting and automation | - Micro-benchmark for require - Micro-benchmark for node start/stop time |
Agents and Data Collectors | Something based on Telegraf? |
Developer tooling: web | Web Tooling Benchmark |
Developer tooling: Node.js | Run npm commands like npm install and npm audit . Ideally we configure npm to use a local registry to eliminate network interference. |
Desktop applications | Electron. Atom. |
Systems software | Synthetic workload provided by jorangreef |
Embedded software | ? |
https://github.com/nodejs/benchmarking/blob/master/docs/case_coverage.md is not completely empty, but it would be happier if there were less blank spaces (which I'm guessing is what you meant).
Yes, I realize that was not clear. I've edited my post to read "somewhat sparse".
From https://github.com/nodejs/benchmarking/pull/243:
@jorangreef I think you might have some comments on the "systems software" use case and perhaps others?
Firstly, thanks @davisjam and everyone here for your efforts expanding the Node.js benchmarking use cases.
Ronomon is an email startup in private beta. It falls into the "systems software" use case. Our new storage stack is being written in Node.js to drive 16x 10TB disks per server.
Things that are important for this use case:
The system encrypts and authenticates large 64KB+ fixed-size disk sectors, and needs to saturate the sequential write throughput of 16 disks. This requires HMAC and AES-256-CTR throughput > 1.6 GB/s. That rules out Node's synchronous crypto from the start, and makes asynchronous crypto essential (https://github.com/ronomon/crypto-async) to avoid blocking and to achieve multi-core throughput. If a disk or storage node fails and we need to rebuild, we can't afford to have the system bottlenecked on the throughput of a single CPU core doing crypto. The alternative of a cluster or multi-process solution would introduce needless complexity and overhead, and defeat the point of using Node.js in the first place, i.e. single-threaded non-blocking control plane with an asynchronous data plane.
Of course, the storage stack is not just doing crypto, it's also doing fs operations, using the same threadpool. At present, this is causing massive head-of-line blocking in the threadpool, with the much faster crypto tasks getting stuck behind the much slower fs tasks. You can imagine what happens when you race the Dakar Rally and the Monaco Grand Prix on the same track. For benchmarking, this means we need to benchmark the threadpool not just for DNS or FS tasks, but also for CPU-intensive tasks.
In addition to crypto and fs tasks, the storage stack also does erasure coding (https://github.com/ronomon/reed-solomon) and deduplication (https://github.com/ronomon/deduplication) using the threadpool. These are too CPU-intensive to be run synchronously, on the order of tens of milliseconds per task, and again we need multi-core throughput to saturate the disks' write bandwidth.
We use direct IO to raw block devices, for more control over a few things, not least to avoid spiking write commit latency due to large write buffer stalls. From a benchmarking point of view, this means that fs benchmarks should reflect realistic disk performance, instead of measuring only the filesystem cache. This becomes especially important when benchmarking the interaction between fs tasks and CPU tasks.
A single Node.js process for one of the storage servers manages 48-64 GB RAM. As a result, most of Ronomon's data structures are already large flat buffers, e.g. https://github.com/ronomon/hash-table, to reduce GC pause times, but reducing GC pause times under load remains critical to avoid blocking the event loop.
Because of the large memory footprint, simple things like spawning a child process asynchronously using Node.js turned out to be synchronous instead, and led to the event loop blocking for 1-2 seconds per async spawn()
. We eventually had to stop using spawn()
and switched to a unix socket. More benchmarks for the Node.js api for large memory footprints would be brilliant.
I hope this helps, Node.js has been great so far, making it easy to dip into C when needed, and with Javascript as a fast control plane language. It's fantastic to have a whole benchmarking team, and I'm looking forward to seeing CPU-intensive tasks becoming first-class asynchronous citizens.
This issues is to discuss/identify candidate benchmarks. So far what we have on the list is:
We expect that we'll want multiple, with at least one to cover each use case identified in https://github.com/nodejs/benchmarking/issues/5