nats-io / nats-server

High-Performance server for, the cloud and edge native messaging system.
Apache License 2.0
15.56k stars 1.39k forks source link

JS stream count not balanced among the cluster nodes #5071

Closed kohlisid closed 7 months ago

kohlisid commented 7 months ago

Observed behavior

When creating a multiple streams (with replicas = 3) on a Jetstream cluster (with number of nodes > stream replica count), I have been observing a behaviour where the streams are not evenly distributed among the servers.

Some of the server instances end up getting a large chunk of the stream replicas.

skohli@macos-JQWR9T560R ~ % nats --context east-sys-ac server report jetstream
│                                         JetStream Summary                                        │
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err    │
│ n1-c1  │ C1      │ 28      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 55      │ 0          │
│ n2-c1* │ C1      │ 1       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 33      │ 2 / 6.060% │
│ n3-c1  │ C1      │ 28      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 30      │ 0          │
│ n4-c1  │ C1      │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0          │
│ n5-c1  │ C1      │ 27      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 63      │ 0          │
│        │         │ 84      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 181     │ 2          │

On some testing, if we wait for some time (sleep = 3s) before creating consecutive streams we end up seeing a far balanced distribution.

skohli@macos-JQWR9T560R ~ % nats --context east-sys-ac server report jetstream
│                                          JetStream Summary                                          │
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err       │
│ n1-c1  │ C1      │ 18      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 696     │ 381 / 54.741% │
│ n2-c1* │ C1      │ 14      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 496     │ 253 / 51.008% │
│ n3-c1  │ C1      │ 17      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 145     │ 0             │
│ n4-c1  │ C1      │ 18      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 64      │ 0             │
│ n5-c1  │ C1      │ 17      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 227     │ 42 / 18.502%  │
│        │         │ 84      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 1,628   │ 676           │

Expected behavior

The expectation was to see a balanced distribution even without the wait between the create calls. In my use case I need to create multiple streams on a Jetstream cluster and such a behaviour might cause performance issues.

I could add a wait to help with the issue but that creates a long delay in the init process when creating a large number of streams.

It would be great if you could highlight if this is the expected behaviour or if there is some other way in which the issue can be remediated?

Server and client version

Nats Server Version: nats-server: v2.10.9 Client version: nats --version 0.1.1

Host environment

uname -a

Darwin  22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64

CPU: Apple M1 Pro arm64

Steps to reproduce

1) Create a Jetstream cluster with 5 server nodes, I'm using the following config for the nodes and starting the servers individually using nats-server -js -c node.conf. Each server having a unique name and the port is added to the cluster


include sys.conf

jetstream {

cluster {
  name: C1
  routes: [

2) Once all the servers are up and running, Create multiple JS streams. All streams here have an identical configuration apart from having a unique name and subject.

I'm using the nats-cli to create 30 streams

for i in {1..30}; do nats --context east-sys stream create bar$i --subjects="test$i.*" --ack --max-msgs=-1 --max-bytes=-1 --max-age=1y --storage file --retention limits --max-msg-size=-1 --discard old --dupe-window="0s" --no-allow-rollup --max-msgs-per-subject=-1 --no-deny-delete  --no-deny-purge --replicas 3; done

The configuration of the streams are as follows

Information for Stream bar1 created 2024-02-12 20:15:57

              Subjects: test1.*
              Replicas: 3
               Storage: File


             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
            Direct Get: true
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false


      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: 1y0d0h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: C1
                Leader: n3-c1
               Replica: n1-c1, current, seen 344ms ago
               Replica: n5-c1, current, seen 344ms ago


              Messages: 0
                 Bytes: 0 B
        First Sequence: 0
         Last Sequence: 0
      Active Consumers: 0

3) Once all streams are created check the Jetstream server report to find the stream count on each server node

nats --context east-sys-ac server report jetstream
│                                          JetStream Summary                                          │
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err       │
│ n1-c1* │ C1      │ 28      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 443     │ 226 / 51.015% │
│ n2-c1  │ C1      │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 442     │ 251 / 56.787% │
│ n3-c1  │ C1      │ 28      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 87      │ 0             │
│ n4-c1  │ C1      │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 52      │ 0             │
│ n5-c1  │ C1      │ 28      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 146     │ 0             │
│        │         │ 84      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 1,170   │ 477           │

4) If the same steps are followed but with the slight modification of adding a sleep interval between the stream creation we are able to see a well balanced system

for i in {1..30}; do sleep 3; nats --context east-sys stream create bar$i --subjects="test$i.*" --ack --max-msgs=-1 --max-bytes=-1 --max-age=1y --storage file --retention limits --max-msg-size=-1 --discard old --dupe-window="0s" --no-allow-rollup --max-msgs-per-subject=-1 --no-deny-delete  --no-deny-purge --replicas 3; done
nats --context east-sys-ac server report jetstream
│                                          JetStream Summary                                         │
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err      │
│ n1-c1  │ C1      │ 18      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 82      │ 0            │
│ n2-c1* │ C1      │ 15      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 191     │ 87 / 45.549% │
│ n3-c1  │ C1      │ 17      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 49      │ 0            │
│ n4-c1  │ C1      │ 19      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 27      │ 0            │
│ n5-c1  │ C1      │ 18      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 92      │ 0            │
│        │         │ 87      │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 441     │ 87           │

derekcollison commented 7 months ago

Longer story but the reason is the selection mechanism is sync but the sorting mechanism works on data that is delivered async from the other servers around mostly HAAssets but also usage etc.

I will look into if we can improve.

derekcollison commented 7 months ago

Should be fixed now, will be in 2.10.11, next release which may go out this week.

kohlisid commented 7 months ago

@derekcollison Thanks for the prompt fix on this :D