nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

[ADD] allow stream placement to evict servers #6138

Open ramonberrutti opened 4 days ago

ramonberrutti commented 4 days ago

NATS introduces the server tag !jetstream, enabling the eviction of JetStream assets. This same approach can be extended to streams.

This PR introduces the ability to evict specific servers from streams.

Signed-off-by: Ramon Berrutti ramonberrutti@gmail.com

derekcollison commented 4 days ago

You can peer remove a server from a system during runtime. Does that not solve your needs? The system will also (if possible) select a new peer for the stream peer sets that are affected. Consumers inherit from the same peer set as their parent stream.

ramonberrutti commented 3 days ago

Hi @derekcollison

We have two distinct use cases for this feature:

Multi-cloud environments with temporal streams:

In a multi-cloud environment, we need to create temporary streams that are not tied to a specific cloud due to customer contract requirements and performance considerations. While using tags is helpful, the current implementation functions as an all-or-nothing solution. For instance, if we have "AWS," "GCP," and "Azure," we must create combinations of tags for each cloud, such as "aws-gcp," "aws-azure," and "gcp-azure." With this feature, we could instead use a tag like !azure to exclude a specific cloud without creating multiple combinations manually.

Cloud maintenance and standby nodes:

In scenarios where a specific cloud requires maintenance, we spin up standby nodes in another cloud and migrate the replicas from the affected cloud to the standby nodes. However, we encounter issues where the streams sometimes fail to return to their original nodes even after multiple peer-removal attempts. While this can be mitigated by adding the !jetstream tag to the standby nodes' configuration, allowing the feature to be applied at the stream level would reduce risks and provide greater flexibility.

Happy to provide more context. Thank you.