Socket timeouts are checked before pending incoming requests are assigned sockets, causing `ECONNRESET`s

zbjornson commented 2 years ago

Version

14.19.1 and others

Platform

Linux qa01-api-1 5.13.0-1024-gcp #29~20.04.1-Ubuntu SMP Thu Apr 14 23:15:00 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

net

What steps will reproduce the bug?

Process 1:

const crypto = require("crypto");
const http = require('http');

const a = Buffer.alloc(1e8);
function doWork(ms) {
    const start = Date.now();
    while (Date.now() - start < ms) {
        crypto.randomFillSync(a);
    }
}

let socketId = 0;
const server = http.createServer((req, res) => {
    if (req.socket.__socketId === undefined) {
        req.socket.__socketId = socketId++;
        const st = req.socket.setTimeout;
        req.socket.setTimeout = (a, b) => {
            console.log(new Date(), "Setting timeout on socket ", req.socket.__socketId);
            return st.call(req.socket, a, b);
        };
        req.socket.once("close", () => console.log(new Date(), `Closing ${req.socket.__socketId}`));
    }

    const reqId = req.headers["x-request-id"];
    console.log(new Date(), `[${reqId}] got request on socket ${req.socket.__socketId}`);
    doWork(1500);
    console.log(new Date(), `[${reqId}] sending response`);
    res.end('hello world\r\n');
});

server.keepAliveTimeout = 1000; // shortened from default 5000 to make test faster

server.listen(8080, '127.0.0.1');

Process 2:

const http = require('http');

const keepAliveAgent = new http.Agent({
    keepAlive: true,
    maxSockets: 2
});

let socketId = 0;

for (let i = 0; i < 3; i++) {
    const req = http.request({
        agent: keepAliveAgent,
        host: '127.0.0.1',
        port: 8080,
        method: 'GET',
        headers: {
            "X-Request-Id": `${i}`
        }
    });

    req.end();

    req.once("socket", socket => {
        if (socket.__id === undefined) socket.__id = socketId++;
        console.log(new Date(), `[${i}] assigned socket ${socket.__id}`);
    });

    req.on('response', function _onResponse(res) {
        res.on('data', function _onData(chunk) {});

        res.on('end', function _doneResponse() {
            console.log(new Date(), `[${i}] res.end`);
        });
    });

    req.on('error', function _onError(err) {
        console.log(new Date(), `[${i}] req.error`, err);
    });
}

How often does it reproduce? Is there a required condition?

100%

What is the expected behavior?

All three requests should succeed.

What do you see instead?

The server logs will look like this:

2022-06-16T22:37:50.307Z [0] got request on socket 0
2022-06-16T22:37:51.809Z [0] sending response
2022-06-16T22:37:51.813Z Setting timeout on socket  0
2022-06-16T22:37:51.814Z [1] got request on socket 1
2022-06-16T22:37:53.317Z [1] sending response
2022-06-16T22:37:53.318Z Setting timeout on socket  1
2022-06-16T22:37:53.320Z Closing 0
2022-06-16T22:37:53.323Z Closing 1

and client

2022-06-16T22:37:50.303Z [0] assigned socket 0
2022-06-16T22:37:50.305Z [1] assigned socket 1
2022-06-16T22:37:51.814Z [0] res.end
2022-06-16T22:37:51.815Z [2] assigned socket 0
2022-06-16T22:37:53.318Z [1] res.end
2022-06-16T22:37:53.321Z [2] req.error Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:209:20) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

Notice: req[2] is assigned a socket at 51.815, just after req[1] is received by the server. Server socket[0] is closed at 53.320, just after the blocking work on req[2], even though there's a request that could be assigned to that socket. I think what's happening is that timers are checked before pending incoming messages are assigned sockets.

Additional information

ECONNRESETs due to Keep-Alive races need to be handled by clients (see #20256, #38890 and several others), but I think this particular scenario can be fixed to avoid a subset of them.

theanarkh commented 2 years ago

I think it's because when a socket times out, Node.js can't tell the two cases

there's really no data coming.
the code is taking too long to execute. This is the packet I caught.

zbjornson commented 2 years ago

Hm, that's not how I interpret these logs. In this capture, the highlighted row (13) is the 3rd request:

And the corresponding Node.js logs:

11:14:40.382 [0] assigned socket 0
11:14:40.384 [1] assigned socket 1
11:14:41.901 [0] res.end
11:14:41.902 [2] assigned socket 0
11:14:43.407 [1] res.end
11:14:43.409 [2] req.error Error: read ECONNRESET

11:14:40.386 [0] got request on socket 0
11:14:41.897 [0] sending response
11:14:41.900 Setting timeout on socket  0
11:14:41.901 [1] got request on socket 1
11:14:43.406 [1] sending response
11:14:43.407 Setting timeout on socket  1
11:14:43.408 Closing 0
11:14:43.412 Closing 1

Key points:

Client's request is assigned a socket at 41.902
Server receives the the request at 41.902 (13)
Server sets timeout on socket at 43.407
Server closes socket at 43.412 just 5 ms later, even though it looks like there's a request that could use it since 41.902. (I think that closure is the FIN in row 19)

theanarkh commented 2 years ago

Sorry, i can not get the point. The reason why the socket 1 is closed(by client) early(5 ms) is the client have exited, when you add setInterval(() => {}, 10000) to client.js. the socket 1 will be closed by server after 1 s. The process is as follow.

client sends two reqs concurrently.
server receives req 0 on socket 0, handle it, setTimeout 1s.
client sends the third req(req 2) on socket 0 to server.
server receives req 1 on socket 1, handle it, setTimeout 1s. after req 1 is handled socket 0 is timeout before receive req 2, and the timeout callback destroy the socket 0 which make the client exit and close the socket 1.

codinbox commented 2 years ago

Client assigned req 2 11:14:41.902 [2] assigned socket 0

server closed socket 0 11:14:43.408 Closing 0

Why socket 0 didnt handle req 2? It came before the socket closed on server and before timeout (1 sec after sending response).

nodejs / node