nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.97k stars 1.41k forks source link

NATS Cluster - Dynamically del node #5421

Open throwbear opened 6 months ago

throwbear commented 6 months ago

Observed behavior

I start four nats-server instances by configuration file as follows:

nats-server.conf

server_name: node1[2|3|4]

http_port: 6222

accounts: {
SYS: { users: [ {user: adm, password: admin123} ] } }

system_account: SYS

cluster: { name: test-cluster listen: 0.0.0.0:4248 routes: [ nats-route://192.168.3.101:4248,
nats-route://192.168.3.102:4248, nats-route://192.168.3.103:4248, nats-route://192.168.3.104:4248,
] }

cmd “nats server list” return as follows:

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Server Overview │ ├─────────┬──────────────┬──────┬─────────┬─────┬───────┬──────┬────────┬─────┬────────┬───────┬───────┬──────┬────────┬─────┤ │ Name │ Cluster │ Host │ Version │ JS │ Conns │ Subs │ Routes │ GWs │ Mem │ CPU % │ Cores │ Slow │ Uptime │ RTT │ ├─────────┼──────────────┼──────┼─────────┼─────┼───────┼──────┼────────┼─────┼────────┼───────┼───────┼──────┼────────┼─────┤ │ node101 │ test-cluster │ 0 │ 2.10.10 │ no │ 1 │ 233 │ 12 │ 0 │ 15 MiB │ 0 │ 43 │ 0 │ 8m7s │ 2ms │ │ node103 │ test-cluster │ 0 │ 2.10.10 │ no │ 0 │ 233 │ 12 │ 0 │ 15 MiB │ 1 │ 20 │ 0 │ 6m27s │ 2ms │ │ node104 │ test-cluster │ 0 │ 2.10.10 │ no │ 0 │ 233 │ 12 │ 0 │ 14 MiB │ 0 │ 48 │ 0 │ 7m54s │ 2ms │ │ node102 │ test-cluster │ 0 │ 2.10.12 │ no │ 0 │ 233 │ 12 │ 0 │ 14 MiB │ 0 │ 56 │ 0 │ 25.71s │ 2ms │ ├─────────┼──────────────┼──────┼─────────┼─────┼───────┼──────┼────────┼─────┼────────┼───────┼───────┼──────┼────────┼─────┤ │ │ 1 │ 4 │ X │ 4 │ 1 │ 932 │ │ │ 57 MiB │ │ │ 0 │ │ │ ╰─────────┴──────────────┴──────┴─────────┴─────┴───────┴──────┴────────┴─────┴────────┴───────┴───────┴──────┴────────┴─────╯

╭─────────────────────────────────────────────────────────────────────────────────╮ │ Cluster Overview │ ├──────────────┬────────────┬───────────────────┬───────────────────┬─────────────┤ │ Cluster │ Node Count │ Outgoing Gateways │ Incoming Gateways │ Connections │ ├──────────────┼────────────┼───────────────────┼───────────────────┼─────────────┤ │ test-cluster │ 4 │ 0 │ 0 │ 1 │ ├──────────────┼────────────┼───────────────────┼───────────────────┼─────────────┤ │ │ 4 │ 0 │ 0 │ 1 │ ╰──────────────┴────────────┴───────────────────┴───────────────────┴─────────────╯

After running a while, I shut down one of cluster server(node103). nats-server log in the rest of cluster keep printing:[ERR] Error trying to connect to route (attempt 13062): dial tcp 192.168.3.103:4248: connect: connection refused

Expected behavior

How to refresh routing-table in the nats cluster Dynamically when removing a node?

Server and client version

nats-server: v2.10.10 nats --version 0.1.3

Host environment

No response

Steps to reproduce

No response

ripienaar commented 6 months ago

If you are specifically listing all cluster members in the configuration then it will try them forever, so in that case you should edit the config and reload the servers.

The server do support learning network topology dynamically if you only listed some of the servers, in that case it should not try forever

throwbear commented 6 months ago

If you are specifically listing all cluster members in the configuration then it will try them forever, so in that case you should edit the config and reload the servers.

The server do support learning network topology dynamically if you only listed some of the servers, in that case it should not try forever

In addition to restarting the servers,is there any commands supporting add or delete routing-table?

ripienaar commented 6 months ago

You can just add a server listing one or two routes and it will learn the topology, and removing it will be fine. But if you list it all in the config - or remove one listed in the config - you need to reload.

Note, dynamic cluster adjustments isn't compatible with JetStream so if you want to use that you need to be more static in nature.