nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.48k stars 1.38k forks source link

Meta Leader Placement #3721

Open ColinSullivan1 opened 1 year ago

ColinSullivan1 commented 1 year ago

Feature Request

It'd be great to have the ability to place (or prefer) the meta-leader on a specific node or constrained to a set of nodes.

Use Case:

This would be an advanced use case and not recommended for typical deployments.

Proposed Change:

Allow specifying a tag for a meta-leader placement, along with the associated cli change. This may be difficult with RAFT.

Who Benefits From The Change(s)?

See use case.

Alternative Approaches

The current workaround is to move the meta leader multiple times until the desired node is assigned.

derekcollison commented 1 year ago

We do allow cluster placement today.

nats server raft step-down -h                                                                                                                                                               
usage: nats server raft step-down [<flags>]

Force a new leader election by standing down the current meta leader

Flags:
  --cluster=CLUSTER  Request placement of the leader in a specific cluster
ColinSullivan1 commented 1 year ago

Yep... was thinking a particular server or set of servers.

jleni commented 1 year ago

Let's say I have a cluster with three nodes and they cannot agree yet on a metaleader. How can I force nats-0 to be the leader instead of waiting indefinitely?

Example:

[97] 2023/06/13 21:14:03.053224 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:14:13.052716 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:14:23.053264 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:14:33.052999 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:14:43.053062 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:14:46.979552 [INF] JetStream cluster no metadata leader
[97] 2023/06/13 21:14:53.053818 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:03.053050 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:03.470356 [WRN] JetStream has not established contact with a meta leader
[97] 2023/06/13 21:15:13.053336 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:22.335674 [INF] JetStream cluster no metadata leader
[97] 2023/06/13 21:15:23.053171 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:33.053000 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:43.052466 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:15:49.582172 [INF] JetStream cluster no metadata leader
[97] 2023/06/13 21:15:53.052286 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:16:03.052552 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:16:13.052606 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:16:23.052654 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[97] 2023/06/13 21:16:33.052723 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
derekcollison commented 1 year ago

If they can not agree it means either the cluster is malformed or mis-configured, or the peer set is actually larger then the cluster size.

We see this when folks accidentally add in other clusters or peers accidentally and turn them off but do not remove them from the JetStream cluster itself through peer-remove.

Once a cluster is healthy, and /healthz returns ok, the upgrade process on the latest helm charts will make sure to not move to the next peer until the last upgraded is back up, operational and reporting ok from /healthz.