Open OliverJAsh opened 3 months ago
From what I see. I believe you’re dealing with a race condition. If that happens -more frequently- while CPU is under load, it must be a race condition. What are u trying to achieve? I can’t follow your code. Why are you closing the port to request it again?
On Fri, 5 Jul 2024 at 15:49 Oliver Joseph Ash @.***> wrote:
Version
20.12.2 Platform
Darwin olivers-mbp.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 arm Darwin
Subsystem
No response What steps will reproduce the bug?
The following script creates 2 cluster worker and each cluster worker does the following:
- Start server (A) on port 0 (random port).
- Close server A, extracting the random port.
- Start another server (B) on the same port as the previous server (A).
test.js:
import cluster from 'node:cluster';import express from 'express'; if (cluster.isPrimary) { const numCPUs = 2;
console.log(
Master process ${process.pid} is running
);// Fork workers. for (let i = 0; i < numCPUs; i++) { cluster.fork(); }} else { const a = express(); a.get('/', (_req, res) => { res.send('A'); }); const b = express(); b.get('/', (_req, res) => { res.send('B'); });
console.log(
[${process.pid}] [A] call listen on port
, 0); const serverA = a.listen(0, () => { const randomPort = serverA.address().port; console.log([${process.pid}] [A] listening on port
, randomPort);serverA.close((error) => { console.log(`[${process.pid}] [A] close`, error); console.log(`[${process.pid}] [B] call listen on port`, randomPort); const serverB = b.listen(randomPort, () => { console.log( `[${process.pid}] [B] listening on port`, serverB.address().port, ); }); serverB.on('error', (error) => { console.log(`[${process.pid}] [B] error`, error); }); });
});}
How often does it reproduce? Is there a required condition?
No response What is the expected behavior? Why is that the expected behavior?
No error. What do you see instead?
Sometimes, but not always, we see an EADDRINUSE error. For example:
$ node testMaster process 16437 is running[16438] [A] call listen on port 0[16439] [A] call listen on port 0[16438] [A] listening on port 58256[16438] [A] close undefined[16438] [B] call listen on port 58256[16439] [A] listening on port 58256[16439] [A] close undefined[16439] [B] call listen on port 58256[16439] [B] listening on port 58256[16438] [B] error Error: bind EADDRINUSE null:58256 at listenOnPrimaryHandle (node:net:1969:18) at rr (node:internal/cluster/child:163:12) at Worker.
(node:internal/cluster/child:113:7) at process.onInternalMessage (node:internal/cluster/utils:49:5) at process.emit (node:events:530:35) at emit (node:internal/child_process:951:14) at process.processTicksAndRejections (node:internal/process/task_queues:83:21) { errno: -48, code: 'EADDRINUSE', syscall: 'bind', address: null, port: 58256} It seems to happen more frequently when the CPU is under pressure.
This is not expected because, as far as I understand:
- It should be possible to bind to the same port across cluster workers.
- Server A has been closed by the time we try to bind server B. (According to the documentation the close callback is only called once the server has closed (i.e. the port has been released?).)
Additional information
I have been unable to reproduce the problem with a single cluster worker which suggests the problem only occurs when there's contention between cluster workers.
— Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/53738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBZUN7ZSYFUDSYRW2LS3UDZK4BFZAVCNFSM6AAAAABKNXSZSCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TGMJSHA2DANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I can’t follow your code.
I think this summarizes what the code is trying to do:
The following script creates 2 cluster workers and each cluster worker does the following:
- Start server (A) on port 0 (random port).
- Close server A.
- Once server A has closed, start another server (B) on the same port as the previous server (A).
What are u trying to achieve? Why are you closing the port to request it again?
I am trying to generate a random port and then, at some point later on, use that random port.
I understand I could restructure the code to avoid the need for two servers within the same cluster worker, but I would like to understand why this error is occurring because it doesn't match the behaviour specified in the Node documentation.
AFAIK you can't bind the same port multiple times on system level, as you can only have on in use at a time, so I'm not seeing the issue here.
If I understand correctly, clustering allows each worker to bind to the same port, for example this does not reproduce the EADDRINUSE
error:
import cluster from 'node:cluster';
import express from 'express';
if (cluster.isPrimary) {
const numCPUs = 2;
console.log(`Master process ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
const a = express();
const port = 1234;
console.log(`[${process.pid}] [A] call listen on port`, port);
const serverA = a.listen(port, () => {
console.log(`[${process.pid}] [A] listening on port`, port);
});
}
However, you can't bind to the same port multiple times within the same cluster worker, for example this consistently reproduces the EADDRINUSE
error:
import cluster from 'node:cluster';
import express from 'express';
if (cluster.isPrimary) {
const numCPUs = 2;
console.log(`Master process ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
const a = express();
const b = express();
const port = 1234;
console.log(`[${process.pid}] [A] call listen on port`, port);
const serverA = a.listen(port, () => {
console.log(`[${process.pid}] [A] listening on port`, port);
});
console.log(`[${process.pid}] [B] call listen on port`, port);
const serverB = b.listen(port, () => {
console.log(`[${process.pid}] [B] listening on port`, port);
});
}
But, this isn't what my original reduced test case is doing because it's closing one server before opening another:
import cluster from 'node:cluster';
import express from 'express';
if (cluster.isPrimary) {
const numCPUs = 2;
console.log(`Master process ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
const a = express();
const b = express();
const port = 0;
console.log(`[${process.pid}] [A] call listen on port`, port);
const serverA = a.listen(port, () => {
const randomPort = serverA.address().port;
console.log(`[${process.pid}] [A] listening on port`, randomPort);
serverA.close((error) => {
console.log(`[${process.pid}] [A] close`, error);
console.log(`[${process.pid}] [B] call listen on port`, randomPort);
const serverB = b.listen(randomPort, () => {
console.log(`[${process.pid}] [B] listening on port`, randomPort);
});
serverB.on('error', (error) => {
console.log(`[${process.pid}] [B] error`, error);
});
});
});
}
Most of the time it works, for example:
Master process 10743 is running
[10745] [A] call listen on port 0
[10744] [A] call listen on port 0
[10745] [A] listening on port 50484
[10744] [A] listening on port 50484
[10745] [A] close undefined
[10745] [B] call listen on port 50484
[10744] [A] close undefined
[10744] [B] call listen on port 50484
[10745] [B] listening on port 50484
[10744] [B] listening on port 50484
It's using the same port in each cluster worker and there's no problem. But occasionally we get an EADDRINUSE
error:
Master process 10936 is running
[10937] [A] call listen on port 0
[10938] [A] call listen on port 0
[10938] [A] listening on port 50503
[10938] [A] close undefined
[10938] [B] call listen on port 50503
[10937] [A] listening on port 50503
[10937] [A] close undefined
[10937] [B] call listen on port 50503
[10938] [B] error Error: bind EADDRINUSE null:50503
at listenOnPrimaryHandle (node:net:1969:18)
at rr (node:internal/cluster/child:163:12)
at Worker.<anonymous> (node:internal/cluster/child:113:7)
at process.onInternalMessage (node:internal/cluster/utils:49:5)
at process.emit (node:events:530:35)
at emit (node:internal/child_process:951:14)
at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
errno: -48,
code: 'EADDRINUSE',
syscall: 'bind',
address: null,
port: 50503
}
[10937] [B] listening on port 50503
According to the documentation, the callback provided to server.close
is only called when the server has closed:
the server is finally closed when all connections are ended and the server emits a 'close' event. The optional callback will be called once the 'close' event occurs
https://nodejs.org/api/net.html#serverclosecallback
So I don't understand why the port is still not available within the same cluster worker after the previous server (A) has been closed.
It's also worth noting that I haven't been able to reproduce this problem when I don't use port 0
for the first server (A):
- const port = 0;
+ const port = 1234;
Another interesting discovery: I can't reproduce this problem when I specify the host as 127.0.0.1
to override the default:
- const serverA = a.listen(port, () => {
+ const serverA = a.listen(port, '127.0.0.1', () => {
Version
20.12.2
Platform
Subsystem
No response
What steps will reproduce the bug?
The following script creates 2 cluster workers and each cluster worker does the following:
How often does it reproduce? Is there a required condition?
No response
What is the expected behavior? Why is that the expected behavior?
No error.
What do you see instead?
Sometimes, but not always, we see an
EADDRINUSE
error. For example:It seems to happen more frequently when the CPU is under pressure.
This is not expected because, as far as I understand:
Additional information
I have been unable to reproduce the problem with a single cluster worker which suggests the problem only occurs when there's contention between cluster workers.