nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.44k stars 29.54k forks source link

http.ClientRequest with Upgrade header sometimes hangs on first request #29701

Open davidje13 opened 5 years ago

davidje13 commented 5 years ago

Connecting to a newly started http.Server with a http.ClientRequest using Connection: Upgrade occasionally (~0.2% probability) hangs indefinitely. The request never reaches the server (no connection event is fired), but strangely the same issue does not appear when using a raw net.Socket on the client and sending equivalent data. The issue also does not appear if the server is bound to 127.0.0.1 rather than the default 0.0.0.0.

The following code replicates the issue reliably; typically it will freeze after ~500 iterations:

const http = require('http');

async function test() {
  const server = http.createServer();
  server.on('connection', () => {
    process.stdout.write(`${Date.now()} | - connection\n`);
  });
  server.on('upgrade', (req, socket) => {
    socket.end('HTTP/1.1 404 Not Found\r\n\r\n');
  });

  await new Promise((resolve) => server.listen(0, resolve));
  const { port } = server.address();

  await new Promise((resolve) => {
    const req = new http.ClientRequest({
      port,
      headers: {
        Connection: 'Upgrade',
        Upgrade: 'x'
      }
    });
    req.once('close', resolve);
    req.end();
  });

  await new Promise((resolve) => server.close(resolve));
}

(async () => {
  for (let i = 0; ; i++) {
    process.stdout.write(`${Date.now()} | Attempt ${i}\n`);
    await test();
  }
})();

Changes which make the issue disappear:


Changes which do not make the issue disappear:


When the issue appears, even the server-side connection event does not fire.

This was found while debugging a suite of tests with occasional failures. Initially reported to https://github.com/websockets/ws/issues/1635

lpinca commented 5 years ago

See discussion in https://github.com/websockets/ws/issues/1635. I think it's an ephemeral port exhaustion / connections kept in CLOSE_WAIT state issue on macOS.

Not sure if there is anything actionable to do.

davidje13 commented 5 years ago

@lpinca I disagree that it's connected to your other issue for several reasons:

Interestingly, since I installed the "macOS Mojave 10.14.6 Supplemental Update 2" after posting this issue, I am unable to replicate it (though I can still replicate lpinca's issue). That might be a coincidence, but since it was trivial to replicate before I'm inclined to believe it may have been patched at an OS level. I will post again if it reappears.

lpinca commented 5 years ago

this only manifests if the request is the first to reach the server (or perhaps is the first to be sent to a particular port)

How did you verify this? In the tests you shared, a lot of servers and clients are started until the issue happens or did I miss something?

this only manifests when using Connection: Upgrade

Are you 100% sure about this? If there is a bug in Node.js upgrade mechanism why the issue disappeared with the latest macOS update and does not manifest on Linux or Windows?

davidje13 commented 5 years ago

@lpinca I verified that it only applies to the first request using a modified (not posted) script, which creates a single server then fires requests repeatedly to the same server. This was part of my original investigation, before I had narrowed down the issue. I observed your issue in that situation, but not mine.

And I'm not 100% sure about Connection: Upgrade being involved, but I am 98% sure. From some quick testing before I was always able to reliably reproduce the issue with the Connection header set, but if I removed all headers the issue did not appear, at least in the few quick runs I performed (given how frequently the issue appeared, a few clean runs is enough to give me confidence).

I have no idea whether this issue manifests on Linux or Windows, and I can't be sure that it's fixed after the update (it just appears to be gone for now, but maybe that's due to some other state which I'm unaware of; another change which has happened is that I recently installed Redis, for example). It could also be related to how long the machine is switched on.

I tried to investigate the node upgrade mechanism but quickly got lost in the ClientRequest code, so I can't currently offer any insight into why it might trigger issues in particular cases. It seems likely that it is due to an OS bug which is being triggered by a particular request from node, but it could also be something not being properly cleaned up by node, or a listener race condition. All speculation though.

thebearingedge commented 3 years ago

On an M1 Mac Mini running macOS 11.2.3 and Node.js 16.1.0.

I'm not sure it's Upgrade requests, is it an IPv6/IPv4 problem?

I've run into this same issue it seems. I was getting intermittent timeouts in a test suite and wasn't able to spot any obvious race conditions. My project is a simple web tunnel server, like localtunnel for piping public HTTP/WebSocket traffic on a remote host through open TCP connections to a local host.

I tried out setting my server's host to '127.0.0.1' as suggested in the first comment. That works. Fast connections and no failures during 8000 iterations of start server -> request Upgrade -> assert on response -> stop server, rerun a couple (2 or 3) times.

I also tried explicitly setting the server's host to 0.0.0.0 IPv4 all interfaces and that worked too.

By default, my http.createServer().listen(cb) uses :: IPv6 all interfaces as the server host.

I suspected it was an IPv6 issue. Configuring http.request() use family: 6 also made the issue go away. 💡

I tested Upgrade requests with bare TCP sockets as suggested above. The problem was much less frequent, but still happened. net.connect() uses IPv4 by default on my system. Switching them to family: 6 seemed to eliminate the problem again.

Last, I tried sending normal GET requests without the Upgrade header. I left the IP families mismatched. Much less frequent issues again. But still failed. Again, configured family: 6 for http.request(). Worked fine.

I have no idea why translating from IPv4 to IPv6 would cause requests to fail or Upgrades to be ultra-shitty, but if on Windows/Linux all default IP families match by default, that may explain the lack of reproducibility on those systems. 🤷‍♀️

@davidje13, is it possible that your issue was coming up due to a mismatch in IP family? Did you confirm that your server was defaulting to 0.0.0.0 and not ::?

davidje13 commented 3 years ago

@thebearingedge it's possible / likely that it was using :: rather than 0.0.0.0; I didn't think to check IPv4 vs IPv6 and I see no reason why it wouldn't use IPv6 on the machine I was using.

btw I don't know if this necessarily fails to reproduce on Linux/Windows, just that nobody's reported seeing it on those systems yet.