nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.57k stars 1.39k forks source link

Weighted clustering #4762

Open sylr opened 10 months ago

sylr commented 10 months ago

Proposed change

Networking is not cheap, it is the primary cause for latency in highly available setups.

In a multi-site setup, network-induced latencies can drastically impact performances and overall latencies.

After doing some tests, it appears that the most latency optimized NATS clustering setup is when clients are connected to the server hosting the Stream and Consumer leaders.

It would be nice if we could assign server weights to the Streams so that their leaders and the leaders of the associated consumers be created on servers we know are the closest to the clients.

E.G.:

STREAM1: server-az1-01 (weight=100), server-az2-02 (weight= 50), server-az3-03 (weight= 20)
STREAM2: server-az1-01 (weight= 50), server-az2-02 (weight=100), server-az3-03 (weight= 20)
STREAM3: server-az1-01 (weight= 50), server-az2-02 (weight= 50), server-az3-03 (weight=100)

Consumers would inherit their stream's servers weights.

A nats jetstream cluster rebalance command would initiate a rebalancing of streams and consumers leaders based on the weights.

Use case

Optimize latencies when they are a critical aspect of the overall application.

Contribution

No response

derekcollison commented 10 months ago

On a global topology like Synadia cloud, which is a global, multi-cloud provider super cluster, we default to creating assets in the cluster you connected to.

However you can place assets in any cluster or based on server tags. You can even move them and scale them after the fact.

sylr commented 10 months ago

I'm talking within a single multi-site cluster (e.g.: servers spread within 3 availability zones in a single amazon region).

The following graph shows latencies between the Message.Metadata.Timestamp and the time the message has been processed by the consumer (Pull consumer, Max Ack Pending 1, 3 replicas, Disk Storage).

The left part of the graph show an un-optimized setup, the disruption in the middle is the rebalancing/optimizing and the right part is the optimized NATS setup. The message consumed are published at a constant rate by a load generator.

image

It's not really showing in the picture because of the scale but the average latency went from 1.16ms to 0.763ms and the rate of message consumed went from 345m/s to 492m/s.