nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.92k stars 1.41k forks source link

If only one NATS cluster is down, it becomes unavailable #4003

Closed marload closed 7 months ago

marload commented 1 year ago

I am using helm to deploy NATS on k8s and have a 3 node cluster. The moment one of the nodes goes down, the rest of the NATS is inaccessible. Is there something I am misunderstanding about the cluster availability of NATS?

$ nats str ls
nats: error: could not list streams: JetStream system temporarily unavailable (10008)
ripienaar commented 1 year ago

Is your stream configured to have 3 replicas? Show stream info.

ripienaar commented 1 year ago

Sorry, my bad - can yoiu please show nats server report jsz not about the individual stream.

wallyqs commented 1 year ago

You could get JetStream unavailable errors when a node detaches from the cluster due to re-elections and reconnects, but generally this should be temporary as the error reports, for publishing messages whenever there is a 503 / no responders errors default retry that waits 250ms before sending the message again after it got an error, though we do not do this for other APIs at the moment but same could be done: https://github.com/nats-io/nats.go/blob/main/js.go#L208

bruth commented 7 months ago

@marload One server in a three node cluster should not make the other two servers unavailable. There would be something else going on.

Given the lack of activity, I am closing, but feel free to reopen if there are still questions or concerns.