nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.31k stars 29.46k forks source link

EADDRINUSE error when trying to bind to port after it was closed #53738

Open OliverJAsh opened 3 months ago

OliverJAsh commented 3 months ago

Version

20.12.2

Platform

Darwin olivers-mbp.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 arm Darwin

Subsystem

No response

What steps will reproduce the bug?

The following script creates 2 cluster workers and each cluster worker does the following:

  1. Start server (A) on port 0 (random port).
  2. Close server A.
  3. Once server A has closed, start another server (B) on the same port as the previous server (A).
import cluster from 'node:cluster';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = 2;

  console.log(`Master process ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  const a = express();
  const b = express();

  const port = 0;

  console.log(`[${process.pid}] [A] call listen on port`, port);
  const serverA = a.listen(port, () => {
    const randomPort = serverA.address().port;
    console.log(`[${process.pid}] [A] listening on port`, randomPort);

    serverA.close((error) => {
      console.log(`[${process.pid}] [A] close`, error);

      console.log(`[${process.pid}] [B] call listen on port`, randomPort);
      const serverB = b.listen(randomPort, () => {
        console.log(`[${process.pid}] [B] listening on port`, randomPort);
      });
      serverB.on('error', (error) => {
        console.log(`[${process.pid}] [B] error`, error);
      });
    });
  });
}

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior? Why is that the expected behavior?

No error.

What do you see instead?

Sometimes, but not always, we see an EADDRINUSE error. For example:

$ node test
Master process 16437 is running
[16438] [A] call listen on port 0
[16439] [A] call listen on port 0
[16438] [A] listening on port 58256
[16438] [A] close undefined
[16438] [B] call listen on port 58256
[16439] [A] listening on port 58256
[16439] [A] close undefined
[16439] [B] call listen on port 58256
[16439] [B] listening on port 58256
[16438] [B] error Error: bind EADDRINUSE null:58256
    at listenOnPrimaryHandle (node:net:1969:18)
    at rr (node:internal/cluster/child:163:12)
    at Worker.<anonymous> (node:internal/cluster/child:113:7)
    at process.onInternalMessage (node:internal/cluster/utils:49:5)
    at process.emit (node:events:530:35)
    at emit (node:internal/child_process:951:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
  errno: -48,
  code: 'EADDRINUSE',
  syscall: 'bind',
  address: null,
  port: 58256
}

It seems to happen more frequently when the CPU is under pressure.

This is not expected because, as far as I understand:

Additional information

I have been unable to reproduce the problem with a single cluster worker which suggests the problem only occurs when there's contention between cluster workers.

juanarbol commented 3 months ago

From what I see. I believe you’re dealing with a race condition. If that happens -more frequently- while CPU is under load, it must be a race condition. What are u trying to achieve? I can’t follow your code. Why are you closing the port to request it again?

On Fri, 5 Jul 2024 at 15:49 Oliver Joseph Ash @.***> wrote:

Version

20.12.2 Platform

Darwin olivers-mbp.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64 arm Darwin

Subsystem

No response What steps will reproduce the bug?

The following script creates 2 cluster worker and each cluster worker does the following:

  1. Start server (A) on port 0 (random port).
  2. Close server A, extracting the random port.
  3. Start another server (B) on the same port as the previous server (A).

test.js:

import cluster from 'node:cluster';import express from 'express'; if (cluster.isPrimary) { const numCPUs = 2;

console.log(Master process ${process.pid} is running);

// Fork workers. for (let i = 0; i < numCPUs; i++) { cluster.fork(); }} else { const a = express(); a.get('/', (_req, res) => { res.send('A'); }); const b = express(); b.get('/', (_req, res) => { res.send('B'); });

console.log([${process.pid}] [A] call listen on port, 0); const serverA = a.listen(0, () => { const randomPort = serverA.address().port; console.log([${process.pid}] [A] listening on port, randomPort);

serverA.close((error) => {
  console.log(`[${process.pid}] [A] close`, error);

  console.log(`[${process.pid}] [B] call listen on port`, randomPort);
  const serverB = b.listen(randomPort, () => {
    console.log(
      `[${process.pid}] [B] listening on port`,
      serverB.address().port,
    );
  });
  serverB.on('error', (error) => {
    console.log(`[${process.pid}] [B] error`, error);
  });
});

});}

How often does it reproduce? Is there a required condition?

No response What is the expected behavior? Why is that the expected behavior?

No error. What do you see instead?

Sometimes, but not always, we see an EADDRINUSE error. For example:

$ node testMaster process 16437 is running[16438] [A] call listen on port 0[16439] [A] call listen on port 0[16438] [A] listening on port 58256[16438] [A] close undefined[16438] [B] call listen on port 58256[16439] [A] listening on port 58256[16439] [A] close undefined[16439] [B] call listen on port 58256[16439] [B] listening on port 58256[16438] [B] error Error: bind EADDRINUSE null:58256 at listenOnPrimaryHandle (node:net:1969:18) at rr (node:internal/cluster/child:163:12) at Worker. (node:internal/cluster/child:113:7) at process.onInternalMessage (node:internal/cluster/utils:49:5) at process.emit (node:events:530:35) at emit (node:internal/child_process:951:14) at process.processTicksAndRejections (node:internal/process/task_queues:83:21) { errno: -48, code: 'EADDRINUSE', syscall: 'bind', address: null, port: 58256}

It seems to happen more frequently when the CPU is under pressure.

This is not expected because, as far as I understand:

  • It should be possible to bind to the same port across cluster workers.
  • Server A has been closed by the time we try to bind server B. (According to the documentation the close callback is only called once the server has closed (i.e. the port has been released?).)

Additional information

I have been unable to reproduce the problem with a single cluster worker which suggests the problem only occurs when there's contention between cluster workers.

— Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/53738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBZUN7ZSYFUDSYRW2LS3UDZK4BFZAVCNFSM6AAAAABKNXSZSCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TGMJSHA2DANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

OliverJAsh commented 3 months ago

I can’t follow your code.

I think this summarizes what the code is trying to do:

The following script creates 2 cluster workers and each cluster worker does the following:

  1. Start server (A) on port 0 (random port).
  2. Close server A.
  3. Once server A has closed, start another server (B) on the same port as the previous server (A).

What are u trying to achieve? Why are you closing the port to request it again?

I am trying to generate a random port and then, at some point later on, use that random port.

I understand I could restructure the code to avoid the need for two servers within the same cluster worker, but I would like to understand why this error is occurring because it doesn't match the behaviour specified in the Node documentation.

RedYetiDev commented 3 months ago

AFAIK you can't bind the same port multiple times on system level, as you can only have on in use at a time, so I'm not seeing the issue here.

OliverJAsh commented 3 months ago

If I understand correctly, clustering allows each worker to bind to the same port, for example this does not reproduce the EADDRINUSE error:

import cluster from 'node:cluster';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = 2;

  console.log(`Master process ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  const a = express();

  const port = 1234;

  console.log(`[${process.pid}] [A] call listen on port`, port);
  const serverA = a.listen(port, () => {
    console.log(`[${process.pid}] [A] listening on port`, port);
  });
}

However, you can't bind to the same port multiple times within the same cluster worker, for example this consistently reproduces the EADDRINUSE error:

import cluster from 'node:cluster';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = 2;

  console.log(`Master process ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  const a = express();
  const b = express();

  const port = 1234;

  console.log(`[${process.pid}] [A] call listen on port`, port);
  const serverA = a.listen(port, () => {
    console.log(`[${process.pid}] [A] listening on port`, port);
  });

  console.log(`[${process.pid}] [B] call listen on port`, port);
  const serverB = b.listen(port, () => {
    console.log(`[${process.pid}] [B] listening on port`, port);
  });
}

But, this isn't what my original reduced test case is doing because it's closing one server before opening another:

import cluster from 'node:cluster';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = 2;

  console.log(`Master process ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  const a = express();
  const b = express();

  const port = 0;

  console.log(`[${process.pid}] [A] call listen on port`, port);
  const serverA = a.listen(port, () => {
    const randomPort = serverA.address().port;
    console.log(`[${process.pid}] [A] listening on port`, randomPort);

    serverA.close((error) => {
      console.log(`[${process.pid}] [A] close`, error);

      console.log(`[${process.pid}] [B] call listen on port`, randomPort);
      const serverB = b.listen(randomPort, () => {
        console.log(`[${process.pid}] [B] listening on port`, randomPort);
      });
      serverB.on('error', (error) => {
        console.log(`[${process.pid}] [B] error`, error);
      });
    });
  });
}

Most of the time it works, for example:

Master process 10743 is running
[10745] [A] call listen on port 0
[10744] [A] call listen on port 0
[10745] [A] listening on port 50484
[10744] [A] listening on port 50484
[10745] [A] close undefined
[10745] [B] call listen on port 50484
[10744] [A] close undefined
[10744] [B] call listen on port 50484
[10745] [B] listening on port 50484
[10744] [B] listening on port 50484

It's using the same port in each cluster worker and there's no problem. But occasionally we get an EADDRINUSE error:

Master process 10936 is running
[10937] [A] call listen on port 0
[10938] [A] call listen on port 0
[10938] [A] listening on port 50503
[10938] [A] close undefined
[10938] [B] call listen on port 50503
[10937] [A] listening on port 50503
[10937] [A] close undefined
[10937] [B] call listen on port 50503
[10938] [B] error Error: bind EADDRINUSE null:50503
    at listenOnPrimaryHandle (node:net:1969:18)
    at rr (node:internal/cluster/child:163:12)
    at Worker.<anonymous> (node:internal/cluster/child:113:7)
    at process.onInternalMessage (node:internal/cluster/utils:49:5)
    at process.emit (node:events:530:35)
    at emit (node:internal/child_process:951:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
  errno: -48,
  code: 'EADDRINUSE',
  syscall: 'bind',
  address: null,
  port: 50503
}
[10937] [B] listening on port 50503

According to the documentation, the callback provided to server.close is only called when the server has closed:

the server is finally closed when all connections are ended and the server emits a 'close' event. The optional callback will be called once the 'close' event occurs

https://nodejs.org/api/net.html#serverclosecallback

So I don't understand why the port is still not available within the same cluster worker after the previous server (A) has been closed.

OliverJAsh commented 3 months ago

It's also worth noting that I haven't been able to reproduce this problem when I don't use port 0 for the first server (A):

-  const port = 0;
+  const port = 1234;
OliverJAsh commented 3 months ago

Another interesting discovery: I can't reproduce this problem when I specify the host as 127.0.0.1 to override the default:

-  const serverA = a.listen(port, () => {
+  const serverA = a.listen(port, '127.0.0.1', () => {