nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.78k stars 29.68k forks source link

Random hangs during repeated loadbalancing #2845

Closed ghost closed 8 years ago

ghost commented 9 years ago

Issue tested on both v4.0.0 and v0.12.7.

Using the hello world script on the node webpage:

var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); }).listen(1337, "127.0.0.1"); console.log('Server running at http://127.0.0.1:1337/');

Running ApacheBench with 10k requests and 10 concurrence (ab -n 10000 -c 10 http://127.0.0.1:1337/) multiple times results in random request hangs (happens to me once or twice every 10k requests) of 10-20 seconds, and sometimes resulting in an apr_socket_recv: Operation timed out (60). The log output when it hangs is as such: https://gist.github.com/anonymous/891817ffc90700866973.

Memwatch-next never reports any leaks, and looking at an activity monitor it appears GC is working as expected. All tests run on hardware with flash memory or SSDs.

I've checked this on various versions of node, and on different hardware, with identical results.

Any ideas?

gajus commented 9 years ago

Have replicated the same issue on multiple machines.

gajus commented 9 years ago

I thought this has to do with the keep-alive (node keeping the connection alive). However, I have made sure that the connections do not stay alive and I am still running into the same issue:

var http = require('http'),
    server;

server = http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain', Connection: 'close'});
    res.end('Hello World\n');
}).listen(1337, "127.0.0.1");

server.addListener("connection",function(stream) {
    stream.setTimeout(10);
});

console.log('Server running at http://127.0.0.1:1337/');
ghost commented 9 years ago

I believe that it might be something to do with kernel maxfiles limits, as per this stackoverflow question: http://stackoverflow.com/questions/760819/is-there-a-limit-on-number-of-tcp-ip-connections-between-machines-on-linux.

It would explain the delay that only occurs after running the ApacheBench multiple times.

gajus commented 9 years ago

This should not be the case if connections are not kept alive.

ghost commented 9 years ago

TIMED_WAIT kernel property causes the TCP ports to remain closed for a while after the connection ends. It would explain why after approx a minute it becomes possible to run most of the requests again.

mscdex commented 9 years ago

Yeah, this is something you have to watch for when doing any network benchmarking all within the same machine.

Trott commented 8 years ago

Closing due to lack of activity and a seemingly-probable non-Node.js explanation. Please feel free to re-open and/or comment if you feel that it should not be closed.