mschwartz / SilkJS

V8 Based JavaScript Swiss Army Knife (and HTTP Server!)
https://github.com/decafjs/decaf
Other
323 stars 37 forks source link

Benchmarks #4

Open X4 opened 12 years ago

X4 commented 12 years ago

Hi Mr. Schwartz

I've read your announcement by luck http://www.sencha.com/forum/showthread.php?160128-Announcing-SilkJS and found your description of your benchmark results a little misleading. I was curious and tested it myself.

It would make sense to share your: Machine Specs (+cores) Kernel parameters (if any) NIC Bandwidth File size of test-file (100Byte, 1KB, 512KB 1MB)

so that comparing becomes easier, in the case someone has the same machine/setup. It also helps to optimize your server.

I can recommend weighttp. ab is single-threaded and utilizes only one core/cpu. Your server doesn't scale linearly, thus varying req/s dependent on req# and concurrence level is normal. Enabling keep-alive also further improves results.

I get about 4.8k to 5k req/s on a 1.3GHz Core2Duo :) I know it's weak, but hey I wanted to share my results. On weighttp with the same parameters on a heavily optimized nginx I get 27k req/s, on a heavily optimized lighttpd I get 23k req/s and on G-WAN without optimization I get 56k req/s. I am sorry, I didn't have had the chance to test nodejs yet.

$: ab -t 30 -c 50 -k http://localhost:9090/anchor.png
...
Server Software:        SILK
Server Hostname:        localhost
Server Port:            9090

Document Path:          /anchor.png
Document Length:        523 bytes

Concurrency Level:      50
Time taken for tests:   10.402 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    50000
Total transferred:      37700000 bytes
HTML transferred:       26150000 bytes
Requests per second:    4806.58 [#/sec] (mean)
Time per request:       10.402 [ms] (mean)
Time per request:       0.208 [ms] (mean, across all concurrent requests)
Transfer rate:          3539.22 [Kbytes/sec] received

Connection Times (ms)
          min  mean[+/-sd] median   max
Connect:        0    1  50.3      0    3005
Processing:     0   10   9.5      8     176
Waiting:        0   10   9.5      8     176
Total:          0   10  52.0      8    3094

Percentage of the requests served within a certain time (ms)
  50%      8
  66%     12
  75%     15
  80%     17
  90%     21
  95%     25
  98%     30
  99%     32
 100%   3094 (longest request)

$: weighttp -n 100000 -c 100 -t 2 -k "http://localhost:9090/anchor.png"
...
finished in 19 sec, 787 millisec and 667 microsec, 5053 req/s, 3721 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
traffic: 75400000 bytes total, 23100000 bytes http, 52300000 bytes data

Btw. apache bench ignores the -t flag ;)

I think using 250 workers is a little naive, because the time lost for context-switches is enormous, it's better to map threads to cpu's. But that's my humble opinion, tell me when I'm wrong :) On a 6Core XEON processor for example you can use up to 10 pthreads after that you won't notice an improvement, but a slow decrease in performance.

Cheers!

mschwartz commented 12 years ago

Your email is somewhat confusing to me.

4806 requests/second seems very good to me, for the limited processing power of your machine. The .208ms mean is sub 1ms, which is quite good.

You may edit the config.js file in htttpd/ directory and change the numChildren value to something less than 250 if you like.

However, if you are going to test with 50 concurrent connections, you will need at least 50 children to serve them.

Enabling keep-alive will always improve results because your client program isn't having to build up and tear down sockets each request. That is a fairly expensive operation.

I'd be interested in seeing similar benchmarks on your hardware against Apache, NodeJS, lighthttpd, nginx, and/or whatever other servers you have available.

I think you are wrong about 250 children being naive and the context switches. The vast majority of the time was spent sending the data, and the processes block during that. There is no penalty for context switching since the OS won't process your blocked processes.

I'd also point out that there are no pthreads in SilkJS, just pure OS processes. Each process is fully isolated from the others via the MMU.

Regards,

On Dec 10, 2011, at 7:27 PM, Fernandos wrote:

Hi Mr. Schwartz

I've read your announcement by luck http://www.sencha.com/forum/showthread.php?160128-Announcing-SilkJS and found your description of your benchmark results curious. It would make sense to share your:

Machine Specs (+cores) Kernel parameters (if any) NIC Bandwidth File size of test-file (100Byte, 1KB, 512KB 1MB)

I can recommend weighttp. ab is single-threaded and utilizes only one core/cpu. Your server doesn't scale linearly, thus varying req/s dependent on req# and concurrence level is normal. Enabling keep-alive also further improves results.

I get about 4.8k to 5k req/s on a 1.3 DualCore :) I know it's weak, but hey I wanted to share my results you. $: ab -t 30 -c 50 -k http://localhost:9090/anchor.png ... Server Software: SILK Server Hostname: localhost Server Port: 9090

Document Path: /anchor.png Document Length: 523 bytes

Concurrency Level: 50 Time taken for tests: 10.402 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Keep-Alive requests: 50000 Total transferred: 37700000 bytes HTML transferred: 26150000 bytes Requests per second: 4806.58 #/sec Time per request: 10.402 ms Time per request: 0.208 [ms](mean, across all concurrent requests) Transfer rate: 3539.22 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 50.3 0 3005 Processing: 0 10 9.5 8 176 Waiting: 0 10 9.5 8 176 Total: 0 10 52.0 8 3094

Percentage of the requests served within a certain time (ms) 50% 8 66% 12 75% 15 80% 17 90% 21 95% 25 98% 30 99% 32 100% 3094 (longest request)

$: weighttp -n 100000 -c 100 -t 2 -k "http://localhost:9090/anchor.png" ... finished in 19 sec, 787 millisec and 667 microsec, 5053 req/s, 3721 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored traffic: 75400000 bytes total, 23100000 bytes http, 52300000 bytes data

I think using 250 workers is naive, because the time lost for context-switches is enormous, it's better to map threads to cpu's. On a 6Core XEON processor for example you can use up to 10 pthreads after that you won't notice an improvement, but a slow decrease in performance.

Cheers :)


Reply to this email directly or view it on GitHub: https://github.com/mschwartz/SilkJS/issues/4

X4 commented 12 years ago

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS processes. Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results in a paste. It'll be an apples vs oranges benchmark though, because gwan, nodejs and silkjs are application servers and nginx, lighttp and apache are pure servers. I was just noting that you can further optimize your server :) Checkout https://github.com/vendu/OS-Zero/ the zmallock implementation there is pretty efficient, I've been told that it's even faster than jemalloc.

mschwartz commented 12 years ago

Thanks for the OS-Zero tip. I'll definitely look at it.

V8 doesn't support threading, or SilkJS would be pthreaded instead of pre-fork...

Cheers

On Sun, Dec 11, 2011 at 10:09 AM, Fernandos < reply@reply.github.com

wrote:

Thank you for giving a quick response :)

I'd also point out that there are no pthreads in SilkJS, just pure OS processes. Oh yes I know, I saw in gdb that gwan uses pthreads for example and I know that pthreads have become very lightweight, compared to earlier.

Ok, sorry I didn't know you can configure the number of children.

Allright, I can benchmark Apache, NodeJS etc. soon and release the results in a paste. I was just noting that you can further optimize your server :) Checkout https://github.com/vendu/OS-Zero/ the zmallock implementation there is pretty efficient, I've been told that it's even faster than jemalloc.


Reply to this email directly or view it on GitHub: https://github.com/mschwartz/SilkJS/issues/4#issuecomment-3098527

nathanaschbacher commented 12 years ago

You could run V8 Isolates in a pthread like threads_a_gogo does in Node. No?

mschwartz commented 12 years ago

I saw this about NodeJS:

https://groups.google.com/forum/?fromgroups#!topic/nodejs/zLzuo292hX0

Seems they wanted to implement V8 Isolates, then backed all that code out of the main code base.

From what I've read about Isolates, you still need to Locker around entering JavaScript context, so you end up with a big contention for the lock.

SilkJS was originially entirely pthread based, but for C++ pages (not JavaScript). I truly wish V8 had the ability to have multiple threads concurrently running in the same context. There would be no preforking in that case, just pre-threading.

coderbuzz commented 11 years ago

Here is my quick benchmarks

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM Debian Crunchbang Linux x32

$ ab -t 30 -c 50 -k http://127.0.0.1/anchor.png Apache/2.2.22 (Debian) Server at 127.0.0.1 Port 80

$ ab -t 30 -c 50 -k http://127.0.0.1:9090/anchor.png SilkJS Server at 127.0.0.1 Port 9090

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png Nodejs Server at 127.0.0.1 Port 8000

$ ab -t 30 -c 50 -k http://127.0.0.1:8000/anchor.png Nodejs Server at 127.0.0.1 Port 8000 - Cluster 4 Core CPU

*UPDATE:

$ ab -t 30 -c 50 -k http://127.0.0.1:8080/anchor.png G-WAN Server at 127.0.0.1 Port 8080 Requests per second: 84900.89 #/sec

HP ProBook 4420s - Intel i5 CPU 2.67GHz, 4.00 GB RAM Windows 8 x64

ab -t 30 -c 50 -k http://127.0.0.1:9000/anchor.png Pashero 32bit Server at 127.0.0.1 Port 9000