nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.51k stars 1.39k forks source link

The server is not removed from the jetstream cluster [v2.10.18] #5759

Open AnjeiKozhev opened 1 month ago

AnjeiKozhev commented 1 month ago

Observed behavior

When I try to remove a running server from the cluster, I get the error::

# nats -s nats://sandbox-nats-vm02:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-1
nats: error: can only remove offline nodes

When I stop the server being removed with the command:

#nats-server --signal ldm

and again I try to remove the server from the cluster, I get the following error:

# nats -s nats://sandbox-nats-vm02:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-1
Removing nats-node-1 can not be reversed, data on this node will be inaccessible.

? Really remove peer nats-node-1 Yes
nats: error: Could not remove wTT3x6c5: nats: no responders available for request

Expected behavior

The server must be removed from the cluster

Server and client version

# nats-server -v
nats-server: v2.10.18
# nats --version
0.0.35

Host environment

Cluster. 6 nodes of: Debian GNU/Linux 11 (bullseye) AMD Ryzen 7 7700 8-Core Processor (family: 0x19, model: 0x61, stepping: 0x2) 64Gb RAM

Client: Debian GNU/Linux 11 (bullseye) AMD Ryzen 9 3900 12-Core Processor 128Gb RAM

Steps to reproduce


# nsc describe account SYS
+--------------------------------------------------------------------------------------+
|                                   Account Details                                    |
+---------------------------+----------------------------------------------------------+
| Name                      | SYS                                                      |
| Account ID                | AA....................................................K6 |
| Issuer ID                 | OB....................................................GZ |
| Issued                    | 2023-07-19 05:16:58 UTC                                  |
| Expires                   |                                                          |
+---------------------------+----------------------------------------------------------+
| Signing Keys              | AC...................................................6YI |
+---------------------------+----------------------------------------------------------+
| Max Connections           | Unlimited                                                |
| Max Leaf Node Connections | Unlimited                                                |
| Max Data                  | Unlimited                                                |
| Max Exports               | Unlimited                                                |
| Max Imports               | Unlimited                                                |
| Max Msg Payload           | Unlimited                                                |
| Max Subscriptions         | Unlimited                                                |
| Exports Allows Wildcards  | True                                                     |
| Disallow Bearer Token     | False                                                    |
| Response Permissions      | Not Set                                                  |
+---------------------------+----------------------------------------------------------+
| Jetstream                 | Disabled                                                 |
+---------------------------+----------------------------------------------------------+
| Imports                   | None                                                     |
+---------------------------+----------------------------------------------------------+
| Tracing Context           | Disabled                                                 |
+---------------------------+----------------------------------------------------------+
neilalexander commented 1 month ago

How many servers do you have JetStream enabled on?

AnjeiKozhev commented 1 month ago

all six

ripienaar commented 1 month ago

I think the no responders available for request here suggests there was no meta leader at the time, can you show nats server report jsz?

AnjeiKozhev commented 1 month ago

# nats -s nats://sandbox-nats-vm01:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server report jetstream
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                              JetStream Summary                                                              │
├─────────────┬──────────────────────┬────────────────────────┬─────────┬───────────┬──────────┬─────────┬─────────┬──────┬─────────┬─────────┤
│ Server      │ Cluster              │ Domain                 │ Streams │ Consumers │ Messages │ Bytes   │ Memory  │ File │ API Req │ API Err │
├─────────────┼──────────────────────┼────────────────────────┼─────────┼───────────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤
│ node-01     │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11      │ 11        │ 2,016    │ 235 KiB │ 235 KiB │ 0 B  │ 0       │ 0       │
│ nats-node-3 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 208     │ 202       │ 70,834   │ 10 MiB  │ 10 MiB  │ 0 B  │ 2,331   │ 0       │
│ node-02*    │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11      │ 11        │ 2,016    │ 235 KiB │ 235 KiB │ 0 B  │ 5,535   │ 0       │
│ nats-node-2 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 208     │ 202       │ 70,834   │ 10 MiB  │ 10 MiB  │ 0 B  │ 4,915   │ 24      │
│ node-03     │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11      │ 11        │ 2,016    │ 235 KiB │ 235 KiB │ 0 B  │ 325     │ 0       │
├─────────────┼──────────────────────┼────────────────────────┼─────────┼───────────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤
│             │                      │                        │ 449     │ 437       │ 147,716  │ 22 MiB  │ 22 MiB  │ 0 B  │ 13,106  │ 24      │
╰─────────────┴──────────────────────┴────────────────────────┴─────────┴───────────┴──────────┴─────────┴─────────┴──────┴─────────┴─────────╯

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                        RAFT Meta Group Information                                        │
├─────────────────────────────────────────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name                                                │ ID       │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ Server name unknown at this time (peerID: xfehUPYE) │ xfehUPYE │        │ false   │ false  │ 0.00s  │ 0   │
│ nats-node-1                                         │ wTT3x6c5 │        │ false   │ false  │ 28m5s  │ 0   │
│ nats-node-2                                         │ qh7tjmNM │        │ true    │ true   │ 0.32s  │ 0   │
│ nats-node-3                                         │ coInD1q6 │        │ true    │ true   │ 0.32s  │ 0   │
│ node-01                                             │ oTzVZnFe │        │ true    │ true   │ 0.32s  │ 0   │
│ node-02                                             │ FSjU5zJ2 │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ node-03                                             │ BZYxHcvp │        │ true    │ true   │ 0.32s  │ 0   │
╰─────────────────────────────────────────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯
ripienaar commented 1 month ago

And white you see this with leader listed, you are getting no responders? Can you run with --trace and show output

AnjeiKozhev commented 1 month ago

# nats --trace -s nats://sandbox-nats-vm02:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-1
13:02:47 >>> $SYS.REQ.SERVER.PING.JSZ: {
  "leader_only": true
}
13:02:47 <<< (901B -> 1408B) {"server":{"name":"node-02","host":"sandbox-nats-vm02.internal.n-p.su","id":"ND...................................................JFK","cluster":"sandbox-nats-cluster","domain":"cluster-sandbox-domain","ver":"2.10.18","tags":["node2"],"jetstream":true,"flags":3,"seq":9379,"time":"2024-08-08T11:02:47.529334147Z"},"data":{"server_id":"ND................................JFK","now":"2024-08-08T11:02:47.529315221Z","config":{"max_memory":1550082048,"max_storage":7936840704,"store_dir":"/var/lib/nats/jetstream","sync_interval":120000000000,"domain":"cluster-sandbox-domain"},"memory":242711,"storage":0,"reserved_memory":0,"reserved_storage":0,"accounts":1,"ha_assets":23,"api":{"total":38425,"errors":0},"streams":11,"consumers":11,"messages":2034,"bytes":242711,"meta_cluster":{"name":"sandbox-nats-cluster","leader":"node-02","peer":"FSjU5zJ2","replicas":[{"name":"Server name unknown at this time (peerID: xfehUPYE)","current":false,"offline":true,"active":0,"peer":"xfehUPYE"},{"name":"nats-node-1","current":false,"offline":true,"active":87267593997559,"peer":"wTT3x6c5"},{"name":"nats-node-2","current":true,"active":593576344,"peer":"qh7tjmNM"},{"name":"nats-node-3","current":true,"active":593579991,"peer":"coInD1q6"},{"name":"node-01","current":true,"active":593506673,"peer":"oTzVZnFe"},{"name":"node-03","current":true,"active":593600630,"peer":"BZYxHcvp"}],"cluster_size":6}}}
13:02:47 <<< Header: map[Content-Encoding:[snappy]]
13:02:47 >>> Received 1 responses
Removing nats-node-1 can not be reversed, data on this node will be inaccessible.

? Really remove peer nats-node-1 Yes
13:03:12 >>> $JS.cluster-sandbox-domain.API.SERVER.REMOVE
{"peer":"","peer_id":"wTT3x6c5"}

13:03:12 <<< $JS.cluster-sandbox-domain.API.SERVER.REMOVE: nats: no responders available for request

nats: error: Could not remove wTT3x6c5: nats: no responders available for request
neilalexander commented 1 month ago

Are all of the online servers reachable to each other, or do you have partitions?

ripienaar commented 1 month ago

hmm, I guess the server api isnt domain aware?

ripienaar commented 1 month ago

Can you try connected to the cluster and then do not set a domain?

AnjeiKozhev commented 1 month ago

Thanks a lot. Without specifying the domain, the deletion was successful.


# nats --trace -s nats://sandbox-nats-vm02:4222  --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-113:51:34 >>> $SYS.REQ.SERVER.PING.JSZ: {
  "leader_only": true
}
13:51:34 <<< (900B -> 1404B) {"server":{"name":"node-02","host":"sandbox-nats-vm02.internal.n-p.su","id":"ND..................JFK","cluster":"sandbox-nats-cluster","domain":"cluster-sandbox-domain","ver":"2.10.18","tags":["node2"],"jetstream":true,"flags":3,"seq":9680,"time":"2024-08-08T11:51:34.965899902Z"},"data":{"server_id":"ND...............JFK","now":"2024-08-08T11:51:34.965879914Z","config":{"max_memory":1550082048,"max_storage":7936840704,"store_dir":"/var/lib/nats/jetstream","sync_interval":120000000000,"domain":"cluster-sandbox-domain"},"memory":242799,"storage":0,"reserved_memory":0,"reserved_storage":0,"accounts":1,"ha_assets":23,"api":{"total":39575,"errors":0},"streams":11,"consumers":11,"messages":2035,"bytes":242799,"meta_cluster":{"name":"sandbox-nats-cluster","leader":"node-02","peer":"FSjU5zJ2","replicas":[{"name":"Server name unknown at this time (peerID: xfehUPYE)","current":false,"offline":true,"active":0,"peer":"xfehUPYE"},{"name":"nats-node-1","current":false,"offline":true,"active":90195030562332,"peer":"wTT3x6c5"},{"name":"nats-node-2","current":true,"active":29742834,"peer":"qh7tjmNM"},{"name":"nats-node-3","current":true,"active":29746651,"peer":"coInD1q6"},{"name":"node-01","current":true,"active":29717636,"peer":"oTzVZnFe"},{"name":"node-03","current":true,"active":29528812,"peer":"BZYxHcvp"}],"cluster_size":6}}}
13:51:34 <<< Header: map[Content-Encoding:[snappy]]
13:51:34 >>> Received 1 responses
Removing nats-node-1 can not be reversed, data on this node will be inaccessible.

? Really remove peer nats-node-1 Yes
13:51:40 >>> $JS.API.SERVER.REMOVE
{"peer":"","peer_id":"wTT3x6c5"}

13:51:40 <<< $JS.API.SERVER.REMOVE
{"type":"io.nats.jetstream.api.v1.meta_server_remove_response","success":true}
Jarema commented 1 month ago

Can you try connected to the cluster and then do not set a domain?

Should we make it domain aware?