nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.59k stars 134 forks source link

node seems to crash under concurency 4/5 - osx/usermode #465

Open eyberg opened 5 years ago

eyberg commented 5 years ago

i didn't test under kvm/linux so I could easily be running into usermode or osx specific issue

it's sporadic on level 4 but always crashes on level 5

➜  mcp git:(master) ab -n 100 -c 4 http://127.0.0.1:8083/
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done

Server Software:
Server Hostname:        127.0.0.1
Server Port:            8083

Document Path:          /
Document Length:        12 bytes

Concurrency Level:      4
Time taken for tests:   0.385 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      11300 bytes
HTML transferred:       1200 bytes
Requests per second:    259.58 [#/sec] (mean)
Time per request:       15.409 [ms] (mean)
Time per request:       3.852 [ms] (mean, across all concurrent requests)
Transfer rate:          28.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1  11.0      0     110
Processing:     5   14  10.5     10      60
Waiting:        4   11  10.4      7      57
Total:          5   15  17.2     10     153

Percentage of the requests served within a certain time (ms)
  50%     10
  66%     10
  75%     12
  80%     17
  90%     30
  95%     31
  98%     60
  99%    153
 100%    153 (longest request)
➜  mcp git:(master) ab -n 100 -c 4 http://127.0.0.1:8083/
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)...Send request failed!
Send request failed!
apr_socket_recv: Connection reset by peer (54)
Total of 93 requests completed

just for comparison on localhost we get around 7kreqs/sec w/out crashing

going to just tag this performance for now - we can come back to this later on once we have more stuff fleshed out

francescolavra commented 1 year ago

As far as I can see, this is not node-specific, if I run a Go web server I see the same behavior, when I run ab with a concurrency level above 2 I see "Connection reset by peer" errors. When these happen, by enabling logging at the lwIP level in the kernel I see that there are missing TCP connection request packets, so the VM doesn't even receive the connection requests from the ab client. I believe this is a limitation of Qemu user mode networking: looking at the tcpx_listen() function at https://gitlab.freedesktop.org/slirp/libslirp/-/blob/master/src/socket.c?ref_type=heads#L848 (Qemu uses libslirp to implement user mode networking), there is a listen() call with the backlog argument set to 1, so concurrent TCP connection requests are not guaranteed to succeed. As to why this behavior differs on MacOS compared to Linux, I believe it's due to a different implementation of the listen() syscall: the listen(2) man page on Linux says "The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests", while the man page on MacOS doesn't say anything in this regard.