Closed choleraehyq closed 8 years ago
It's pretty fast, but could be a lot faster:
It's more a toy framework that focus on features and correctness for now.
Of course, all the CPU-intensive work (e.g. video decoding with GStreamer) can be seamlessly integrated within a Web application in Vala. There's serious interest there, especially if you need to work with existing C libraries.
There's an in-progress implementation of the TechEmpower Framework Benchmark in the organization. I (or anyone else) could finish and submit it once the 0.3.0
is out.
Here's some samples! The following command were run:
wrk --latency --threads 12 --connections 128 http://localhost:<port>/
I have a
Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
Valum
Running 10s test @ http://localhost:3003/
12 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 31.33ms 139.19ms 1.71s 97.42%
Req/Sec 669.07 383.08 3.16k 78.42%
Latency Distribution
50% 12.65ms
75% 14.57ms
90% 15.79ms
99% 824.30ms
74132 requests in 10.10s, 9.12MB read
Socket errors: connect 0, read 0, write 0, timeout 22
Requests/sec: 7339.99
Transfer/sec: 0.90MB
Iron (rust)
Running 10s test @ http://localhost:3000/
12 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.82ms 4.17ms 89.67ms 69.35%
Req/Sec 1.67k 707.05 2.81k 67.25%
Latency Distribution
50% 4.77ms
75% 5.99ms
90% 8.87ms
99% 21.40ms
67005 requests in 10.10s, 7.28MB read
Requests/sec: 6635.12
Transfer/sec: 738.68KB
Node.js
Running 10s test @ http://localhost:8000/
12 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.29ms 2.84ms 65.48ms 94.70%
Req/Sec 1.09k 261.40 6.29k 96.84%
Latency Distribution
50% 8.58ms
75% 9.93ms
90% 11.28ms
99% 15.28ms
130927 requests in 10.10s, 19.48MB read
Requests/sec: 12963.08
Transfer/sec: 1.93MB
You can see that scalable polling makes a massive difference in the case of Node.js.
Iron has better latency because it dispatches requests in light threads, but the GLib main loop provide better concurrency even with a single-threaded model.
Moreover, it's not like libsoup-2.4 is heavily optimized. In real-world, you would probably use something like SCGI, which I think of rewritting upon GThreadedSocketService
.
Hope that answers your questions!
Oh, by the way you can also fork the application to distribute the load on multiple cores.
In this case we have --forks=4
:
Running 10s test @ http://localhost:3003/
12 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 22.84ms 120.39ms 1.68s 97.84%
Req/Sec 1.11k 524.27 5.75k 80.12%
Latency Distribution
50% 8.08ms
75% 8.81ms
90% 10.16ms
99% 656.09ms
129145 requests in 10.10s, 15.89MB read
Socket errors: connect 0, read 0, write 0, timeout 22
Requests/sec: 12787.12
Transfer/sec: 1.57MB
I managed to get around 100k req/sec with 64 forks on the cluster at work ;)
That helps me much! Thanks.
I'm curious about the performance of valum.