Open AnjeiKozhev opened 1 month ago
How many servers do you have JetStream enabled on?
all six
I think the no responders available for request
here suggests there was no meta leader at the time, can you show nats server report jsz
?
# nats -s nats://sandbox-nats-vm01:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server report jetstream
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ JetStream Summary │
├─────────────┬──────────────────────┬────────────────────────┬─────────┬───────────┬──────────┬─────────┬─────────┬──────┬─────────┬─────────┤
│ Server │ Cluster │ Domain │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├─────────────┼──────────────────────┼────────────────────────┼─────────┼───────────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤
│ node-01 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11 │ 11 │ 2,016 │ 235 KiB │ 235 KiB │ 0 B │ 0 │ 0 │
│ nats-node-3 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 208 │ 202 │ 70,834 │ 10 MiB │ 10 MiB │ 0 B │ 2,331 │ 0 │
│ node-02* │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11 │ 11 │ 2,016 │ 235 KiB │ 235 KiB │ 0 B │ 5,535 │ 0 │
│ nats-node-2 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 208 │ 202 │ 70,834 │ 10 MiB │ 10 MiB │ 0 B │ 4,915 │ 24 │
│ node-03 │ sandbox-nats-cluster │ cluster-sandbox-domain │ 11 │ 11 │ 2,016 │ 235 KiB │ 235 KiB │ 0 B │ 325 │ 0 │
├─────────────┼──────────────────────┼────────────────────────┼─────────┼───────────┼──────────┼─────────┼─────────┼──────┼─────────┼─────────┤
│ │ │ │ 449 │ 437 │ 147,716 │ 22 MiB │ 22 MiB │ 0 B │ 13,106 │ 24 │
╰─────────────┴──────────────────────┴────────────────────────┴─────────┴───────────┴──────────┴─────────┴─────────┴──────┴─────────┴─────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────────────────────────────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ ID │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ Server name unknown at this time (peerID: xfehUPYE) │ xfehUPYE │ │ false │ false │ 0.00s │ 0 │
│ nats-node-1 │ wTT3x6c5 │ │ false │ false │ 28m5s │ 0 │
│ nats-node-2 │ qh7tjmNM │ │ true │ true │ 0.32s │ 0 │
│ nats-node-3 │ coInD1q6 │ │ true │ true │ 0.32s │ 0 │
│ node-01 │ oTzVZnFe │ │ true │ true │ 0.32s │ 0 │
│ node-02 │ FSjU5zJ2 │ yes │ true │ true │ 0.00s │ 0 │
│ node-03 │ BZYxHcvp │ │ true │ true │ 0.32s │ 0 │
╰─────────────────────────────────────────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯
And white you see this with leader listed, you are getting no responders? Can you run with --trace and show output
# nats --trace -s nats://sandbox-nats-vm02:4222 --js-domain=cluster-sandbox-domain --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-1
13:02:47 >>> $SYS.REQ.SERVER.PING.JSZ: {
"leader_only": true
}
13:02:47 <<< (901B -> 1408B) {"server":{"name":"node-02","host":"sandbox-nats-vm02.internal.n-p.su","id":"ND...................................................JFK","cluster":"sandbox-nats-cluster","domain":"cluster-sandbox-domain","ver":"2.10.18","tags":["node2"],"jetstream":true,"flags":3,"seq":9379,"time":"2024-08-08T11:02:47.529334147Z"},"data":{"server_id":"ND................................JFK","now":"2024-08-08T11:02:47.529315221Z","config":{"max_memory":1550082048,"max_storage":7936840704,"store_dir":"/var/lib/nats/jetstream","sync_interval":120000000000,"domain":"cluster-sandbox-domain"},"memory":242711,"storage":0,"reserved_memory":0,"reserved_storage":0,"accounts":1,"ha_assets":23,"api":{"total":38425,"errors":0},"streams":11,"consumers":11,"messages":2034,"bytes":242711,"meta_cluster":{"name":"sandbox-nats-cluster","leader":"node-02","peer":"FSjU5zJ2","replicas":[{"name":"Server name unknown at this time (peerID: xfehUPYE)","current":false,"offline":true,"active":0,"peer":"xfehUPYE"},{"name":"nats-node-1","current":false,"offline":true,"active":87267593997559,"peer":"wTT3x6c5"},{"name":"nats-node-2","current":true,"active":593576344,"peer":"qh7tjmNM"},{"name":"nats-node-3","current":true,"active":593579991,"peer":"coInD1q6"},{"name":"node-01","current":true,"active":593506673,"peer":"oTzVZnFe"},{"name":"node-03","current":true,"active":593600630,"peer":"BZYxHcvp"}],"cluster_size":6}}}
13:02:47 <<< Header: map[Content-Encoding:[snappy]]
13:02:47 >>> Received 1 responses
Removing nats-node-1 can not be reversed, data on this node will be inaccessible.
? Really remove peer nats-node-1 Yes
13:03:12 >>> $JS.cluster-sandbox-domain.API.SERVER.REMOVE
{"peer":"","peer_id":"wTT3x6c5"}
13:03:12 <<< $JS.cluster-sandbox-domain.API.SERVER.REMOVE: nats: no responders available for request
nats: error: Could not remove wTT3x6c5: nats: no responders available for request
Are all of the online servers reachable to each other, or do you have partitions?
hmm, I guess the server api isnt domain aware?
Can you try connected to the cluster and then do not set a domain?
Thanks a lot. Without specifying the domain, the deletion was successful.
# nats --trace -s nats://sandbox-nats-vm02:4222 --creds=/etc/nats/creds/sys.creds server raft peer-remove nats-node-113:51:34 >>> $SYS.REQ.SERVER.PING.JSZ: {
"leader_only": true
}
13:51:34 <<< (900B -> 1404B) {"server":{"name":"node-02","host":"sandbox-nats-vm02.internal.n-p.su","id":"ND..................JFK","cluster":"sandbox-nats-cluster","domain":"cluster-sandbox-domain","ver":"2.10.18","tags":["node2"],"jetstream":true,"flags":3,"seq":9680,"time":"2024-08-08T11:51:34.965899902Z"},"data":{"server_id":"ND...............JFK","now":"2024-08-08T11:51:34.965879914Z","config":{"max_memory":1550082048,"max_storage":7936840704,"store_dir":"/var/lib/nats/jetstream","sync_interval":120000000000,"domain":"cluster-sandbox-domain"},"memory":242799,"storage":0,"reserved_memory":0,"reserved_storage":0,"accounts":1,"ha_assets":23,"api":{"total":39575,"errors":0},"streams":11,"consumers":11,"messages":2035,"bytes":242799,"meta_cluster":{"name":"sandbox-nats-cluster","leader":"node-02","peer":"FSjU5zJ2","replicas":[{"name":"Server name unknown at this time (peerID: xfehUPYE)","current":false,"offline":true,"active":0,"peer":"xfehUPYE"},{"name":"nats-node-1","current":false,"offline":true,"active":90195030562332,"peer":"wTT3x6c5"},{"name":"nats-node-2","current":true,"active":29742834,"peer":"qh7tjmNM"},{"name":"nats-node-3","current":true,"active":29746651,"peer":"coInD1q6"},{"name":"node-01","current":true,"active":29717636,"peer":"oTzVZnFe"},{"name":"node-03","current":true,"active":29528812,"peer":"BZYxHcvp"}],"cluster_size":6}}}
13:51:34 <<< Header: map[Content-Encoding:[snappy]]
13:51:34 >>> Received 1 responses
Removing nats-node-1 can not be reversed, data on this node will be inaccessible.
? Really remove peer nats-node-1 Yes
13:51:40 >>> $JS.API.SERVER.REMOVE
{"peer":"","peer_id":"wTT3x6c5"}
13:51:40 <<< $JS.API.SERVER.REMOVE
{"type":"io.nats.jetstream.api.v1.meta_server_remove_response","success":true}
Can you try connected to the cluster and then do not set a domain?
Should we make it domain aware?
Observed behavior
When I try to remove a running server from the cluster, I get the error::
When I stop the server being removed with the command:
and again I try to remove the server from the cluster, I get the following error:
Expected behavior
The server must be removed from the cluster
Server and client version
Host environment
Cluster. 6 nodes of: Debian GNU/Linux 11 (bullseye) AMD Ryzen 7 7700 8-Core Processor (family: 0x19, model: 0x61, stepping: 0x2) 64Gb RAM
Client: Debian GNU/Linux 11 (bullseye) AMD Ryzen 9 3900 12-Core Processor 128Gb RAM
Steps to reproduce