nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.93k stars 29.18k forks source link

http: keepalive race condition near timeout #52649

Closed mweberxyz closed 2 weeks ago

mweberxyz commented 5 months ago

Version

v21.7.3

Platform

Darwin [me] 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:08 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8122 arm64

Subsystem

http

What steps will reproduce the bug?

Run server.js:

const { createServer } = require("node:http");

const server = createServer(
  { keepAlive: true, keepAliveTimeout: 1000 },
  (req, res) => {
    res.writeHead(200, { "Content-Type": "application/json" });
    res.end(
      JSON.stringify({
        data: "Hello World!",
      })
    );
  }
);

server.listen(3000, async () => {
  const endpoint = `http://localhost:${server.address().port}`;
  console.log(endpoint);
});

Run client.js:

const http = require("http");

const timeouts = [
  950, 960, 970, 980, 990, 995, 995, 996, 996, 997, 997, 998, 998, 999, 999,
];
let prevTimeout = 0;

const sendRequest = () => {
  http.get("http://127.0.0.1:3000", (res) => {
    res.on("data", () => {});
    res.on("end", () => {
      console.log(`success: ${prevTimeout}`);
      if (timeouts.length) {
        prevTimeout = timeouts.shift();
        setTimeout(sendRequest, prevTimeout);
      }
    });
  });
};

sendRequest();

How often does it reproduce? Is there a required condition?

Fairly reliably, might take a few runs to see it, or adjust the timeouts to account for device differences.

What is the expected behavior? Why is that the expected behavior?

% node client-http.js
success: 0
success: 950
success: 960
success: 970
success: 980
success: 990
success: 995
success: 995
success: 996
success: 996
success: 997
success: 997
success: 998
success: 998
success: 999
success: 999

It is expected that (generally) every request should succeed.

What do you see instead?

% node client-http.js
success: 0
success: 950
success: 960
success: 970
success: 980
success: 990
success: 995
success: 995
success: 996
node:events:497
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
Emitted 'error' event on ClientRequest instance at:
    at Socket.socketErrorListener (node:_http_client:500:9)
    at Socket.emit (node:events:519:28)
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  errno: -54,
  code: 'ECONNRESET',
  syscall: 'read'
}

Node.js v21.7.3

In the full debug logs - http.log - the server connection close is apparent, followed by the attempt to use the closed socket for the next request:

...
NET 92544: destroy
NET 92544: close
NET 92544: close handle
HTTP 92544: SOCKET ERROR: read ECONNRESET Error: read ECONNRESET
...

Additional information

The server is closing the connection at it's own 1000ms timeout, but the keepAlive timers in the client either:

The issue existed prior to Node 20, but it's incidence is more pronounced now that keepAlive is defaulted to true in http.globalAgent.

A similar issue was identified in https://github.com/nodejs/undici/issues/3141. Undici already had logic in place to close connections on the client side 1000ms prior to the keep-alive value provided by the server (to account for timer/transport latency) but said logic was failing to run due to an unrelated issue.

RedYetiDev commented 5 months ago

@nodejs/http

mokimolo commented 5 months ago

1

EthanTuning commented 5 months ago

I'm experiencing this exact issue when trying to run ionic start to start a new ionic project.

image

mweberxyz commented 5 months ago

@EthanTuning it's possible though unlikely related to this issue. ECONNRESET is a fairly general error code for when the server closes the socket unexpectedly. This issue is concerned with when an ECONNRESET occurs in the exact same tick of the event loop as a http request resuming on an previously used socket - which seems unlikely to be reproduced repeatedly except in the kind of narrow circumstances proposed in my post.

mweberxyz commented 5 months ago

Is Node ignoring server sent Keep Alive hint headers on purpose? This method in _http_agent.js is suspect:

Agent.prototype.keepSocketAlive = function keepSocketAlive(socket) {
  socket.setKeepAlive(true, this.keepAliveMsecs);
  socket.unref();

  let agentTimeout = this.options.timeout || 0;

  if (socket._httpMessage?.res) {
    const keepAliveHint = socket._httpMessage.res.headers['keep-alive'];

    if (keepAliveHint) {
      const hint = RegExpPrototypeExec(/^timeout=(\d+)/, keepAliveHint)?.[1];

      if (hint) {
        const serverHintTimeout = NumberParseInt(hint) * 1000;

        if (serverHintTimeout < agentTimeout) {
          agentTimeout = serverHintTimeout;
        }
      }
    }
  }

  if (socket.timeout !== agentTimeout) {
    socket.setTimeout(agentTimeout);
  }

  return true;
};

The serverHintTimeout < agentTimeout conditional always evaluates to false (in the default conditions) because the default agentTimeout is 0 -- which means Node will never opportunistically close a connection and instead always wait for the server to close it first.

mcollina commented 5 months ago

https://github.com/nodejs/node/pull/52653 seems relevant.

mcollina commented 5 months ago

The serverHintTimeout < agentTimeout conditional always evaluates to false (in the default conditions) because the default agentTimeout is 0 -- which means Node will never opportunistically close a connection and instead always wait for the server to close it first.

I think we should close them actually. PR?

EthanTuning commented 5 months ago

@EthanTuning it's possible though unlikely related to this issue. ECONNRESET is a fairly general error code for when the server closes the socket unexpectedly. This issue is concerned with when an ECONNRESET occurs in the exact same tick of the event loop as a http request resuming on an previously used socket - which seems unlikely to be reproduced repeatedly except in the kind of narrow circumstances proposed in my post.

Oh thanks, yeah I was very unsure, but googling got me here haha!

mweberxyz commented 5 months ago

The serverHintTimeout < agentTimeout conditional always evaluates to false (in the default conditions) because the default agentTimeout is 0 -- which means Node will never opportunistically close a connection and instead always wait for the server to close it first.

I realize default in this context meant two different things.

Using the default http.globalAgent -- the default timeout is 5000ms. Using a non-default agent, with keepAlive on, but otherwise default parameters, (ie- new http.Agent({ keepAlive: true }); -- the issue arises.