nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.92k stars 1.41k forks source link

NATS JetStream Cluster - Dynamically add node #3665

Closed csuriano23 closed 7 months ago

csuriano23 commented 1 year ago

Hi there, from what I read from the docs, it seems there is no way to configure a NATS JetStream cluster and then dynamically add a node to the existing cluster.

My need is specifically for local environment, so that I can tear up each microservice docker stack atomically for unit testing and then clustering all the instances for integration testing without having to rewrite the stack.

Any idea on how to achieve that?

derekcollison commented 1 year ago

Just start a new server and add to the cluster, it will work.

But if you decide to remove it, you need to make sure to tell the system to remove it and that the peer is not coming back.

csuriano23 commented 1 year ago

Thank you @derekcollison

I think I have reached the goal, setting up one cluster as:

version: "3.5"
services:
  nats:
    image: nats
    ports:
      - "4222:4222"
      - "8222:8222"
    command: "--server_name n1m1 --js --store_dir /data --cluster_name NATS --cluster nats://0.0.0.0:6222 --http_port 8222 --routes=nats://nats:6222"
    networks: [ "nats" ]
  nats-1:
    image: nats
    command: "--server_name n1m2 --js --store_dir /data --cluster_name NATS --cluster nats://0.0.0.0:6222 --routes=nats://nats:6222"
    networks: [ "nats" ]
    depends_on: [ "nats" ]
  nats-2:
    image: nats
    command: "--server_name n1m3 --js --store_dir /data --cluster_name NATS --cluster nats://0.0.0.0:6222 --routes=nats://nats:6222"
    networks: [ "nats" ]
    depends_on: [ "nats" ]

networks:
  nats:
    name: nats

and attaching another as:

version: "3.5"
services:
  nats-ancor:
    image: nats
    ports:
      - "4223:4222"
    command: "--server_name n2m1 --js --store_dir /data --cluster_name NATS --cluster nats://0.0.0.0:6222 --routes=nats://nats:6222,nats://nats-ancor:6222"
    networks: [ "nats" ]
  nats-ancor2:
    image: nats
    command: "--server_name n2m2 --js --store_dir /data --cluster_name NATS --cluster nats://0.0.0.0:6222 --routes=nats://nats:6222,nats://nats-ancor:6222"
    networks: [ "nats" ]

networks:
  nats:
    name: nats

everything seems fine and messages published after tearing up the second compose (localhost:4222) are correctly received by subscribers attached on the other instance (localhost:4223).

The only issue I find is that sometimes, after tearing up the second compose, a continuous switching between leaders happens:

2022-11-25 09:08:58 [1] 2022/11/25 08:08:58.650248 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:08:59 [1] 2022/11/25 08:08:59.214923 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:00 [1] 2022/11/25 08:09:00.645630 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:01 [1] 2022/11/25 08:09:01.214003 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:01 [1] 2022/11/25 08:09:01.645159 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:02 [1] 2022/11/25 08:09:02.213404 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:02 [1] 2022/11/25 08:09:02.649775 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:03 [1] 2022/11/25 08:09:03.215610 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:03 [1] 2022/11/25 08:09:03.645244 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:04 [1] 2022/11/25 08:09:04.211452 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:04 [1] 2022/11/25 08:09:04.645996 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:05 [1] 2022/11/25 08:09:05.212703 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:05 [1] 2022/11/25 08:09:05.645641 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:06 [1] 2022/11/25 08:09:06.211454 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:06 [1] 2022/11/25 08:09:06.646649 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:07 [1] 2022/11/25 08:09:07.215309 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:07 [1] 2022/11/25 08:09:07.648165 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:08 [1] 2022/11/25 08:09:08.212681 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:08 [1] 2022/11/25 08:09:08.648338 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:09 [1] 2022/11/25 08:09:09.211939 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:09 [1] 2022/11/25 08:09:09.645624 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:10 [1] 2022/11/25 08:09:10.212488 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:10 [1] 2022/11/25 08:09:10.648109 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:11 [1] 2022/11/25 08:09:11.215453 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:11 [1] 2022/11/25 08:09:11.646004 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:12 [1] 2022/11/25 08:09:12.222187 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:12 [1] 2022/11/25 08:09:12.645080 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:13 [1] 2022/11/25 08:09:13.213355 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:13 [1] 2022/11/25 08:09:13.647581 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:14 [1] 2022/11/25 08:09:14.211603 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:14 [1] 2022/11/25 08:09:14.646100 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:15 [1] 2022/11/25 08:09:15.213722 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:15 [1] 2022/11/25 08:09:15.646205 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:16 [1] 2022/11/25 08:09:16.214133 [INF] JetStream cluster new metadata leader: n2m2/NATS
2022-11-25 09:09:16 [1] 2022/11/25 08:09:16.647944 [INF] JetStream cluster new metadata leader: n1m1/NATS
2022-11-25 09:09:17 [1] 2022/11/25 08:09:17.212649 [INF] JetStream cluster new metadata leader: n2m2/NATS

Maybe it is related to the caveat you described

derekcollison commented 1 year ago

That behavior is showing something is not working correctly..

What do you mean by "tearing up"?

What does nats server report jetstream show?

csuriano23 commented 1 year ago

For tearing up I simply mean docker compose up -d --build

When applying nats server report jetstream --server=... I receive: nats: error: server request failed, ensure the account used has system privileges and appropriate permissions both on localhost and on nats-box started on the same nats network (docker run --rm -it --network=nats natsio/nats-box:latest)

derekcollison commented 1 year ago

You will need a system user for many of the NATS cli commands. Looping in @wallyqs.

csuriano23 commented 1 year ago

I managed to reproduce the issue with a system user configured.

These are the files:

./v0/docker-compose.yml

version: "3.8"

services:
  n1m1:
    image: nats:2.9.3
    restart: unless-stopped
    ports:
      - "4222:4222"
      - "8222:8222"
    volumes:
      - ./config:/config
    command: "--config /config/n1m1.conf"
    networks:
      - backbone

  n1m2:
    image: nats:2.9.3
    restart: unless-stopped
    command: "--config /config/n1m2.conf"
    volumes:
      - ./config:/config
    networks:
      - backbone
    depends_on:
      - n1m1

  n1m3:
    image: nats
    restart: unless-stopped
    command: "--config /config/n1m3.conf"
    volumes:
      - ./config:/config
    networks:
      - backbone
    depends_on:
      - n1m1

  nats-cli:
    image: natsio/nats-box:0.13.2
    restart: unless-stopped
    tty: true
    networks:
      - backbone

networks:
  backbone:
    name: backbone

./v1/docker-compose.yml

version: "3.8"

services:
  n2m1:
    image: nats:2.9.3
    restart: unless-stopped
    ports:
      - "4223:4222"
    volumes:
      - ./config:/config
    command: "--config /config/n2m1.conf"
    networks:
      - backbone

  n2m2:
    image: nats:2.9.3
    restart: unless-stopped
    volumes:
      - ./config:/config
    command: "--config /config/n2m2.conf"
    networks:
      - backbone

networks:
  backbone:
    name: backbone

./v0/config/n1m1.conf

server_name=n1m1
listen=4222
http_port=8222

accounts {
 $SYS { users = [ { user: "admin", pass: "admin" } ] }
}

jetstream {
   store_dir=/data
}

cluster {
  name: NATS
  listen: 0.0.0.0:6222
  routes: [
    nats-route://n1m2:6222
    nats-route://n1m3:6222
  ]
}

./v0/config/n1m2.conf

server_name=n1m2
listen=4222

accounts {
 $SYS { users = [ { user: "admin", pass: "admin" } ] }
}

jetstream {
   store_dir=/data
}

cluster {
  name: NATS
  listen: 0.0.0.0:6222
  routes: [
    nats-route://n1m1:6222
    nats-route://n1m3:6222
  ]
}

./v0/config/n1m3.conf

server_name=n1m3
listen=4222

accounts {
 $SYS { users = [ { user: "admin", pass: "admin" } ] }
}

jetstream {
   store_dir=/data
}

cluster {
  name: NATS
  listen: 0.0.0.0:6222
  routes: [
    nats-route://n1m1:6222
    nats-route://n1m2:6222
  ]
}

./v1/config/n2m1.conf

server_name=n2m1
listen=4222

accounts {
 $SYS { users = [ { user: "admin", pass: "admin" } ] }
}

jetstream {
   store_dir=/data
}

cluster {
  name: NATS
  listen: 0.0.0.0:6222
  routes: [
    nats-route://n2m2:6222
    nats-route://n1m1:6222
  ]
}

./v1/config/n2m2.conf

server_name=n2m2
listen=4222

accounts {
 $SYS { users = [ { user: "admin", pass: "admin" } ] }
}

jetstream {
   store_dir=/data
}

cluster {
  name: NATS
  listen: 0.0.0.0:6222
  routes: [
    nats-route://n2m1:6222
    nats-route://n1m1:6222
  ]
}

Also the reporting behavior seems to be periodic, switching between:

~ # nats server report jetstream --server=n1m1 --user=admin --password=admin
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
│                                       JetStream Summary                                       │
├────────┬─────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ n1m1   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n2m1*  │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n2m2   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n1m3   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n1m2*  │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│        │         │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
╰────────┴─────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────╯

╭─────────────────────────────────────────────────╮
│           RAFT Meta Group Information           │
├──────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ Leader │ Current │ Online │ Active │ Lag │
├──────┼────────┼─────────┼────────┼────────┼─────┤
│ n1m1 │        │ true    │ true   │ 0.68s  │ 0   │
│ n1m2 │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ n1m3 │        │ true    │ true   │ 0.68s  │ 0   │
│ n2m1 │        │ false   │ true   │ 0.68s  │ 9   │
│ n2m2 │        │ true    │ true   │ 0.68s  │ 0   │
╰──────┴────────┴─────────┴────────┴────────┴─────╯

and

~ # nats server report jetstream --server=n1m1 --user=admin --password=admin
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
│                                       JetStream Summary                                       │
├────────┬─────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ n1m1   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n2m2   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n1m2*  │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n1m3   │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
│ n2m1*  │ NATS    │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
├────────┼─────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│        │         │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 0       │ 0       │
╰────────┴─────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────╯

╭─────────────────────────────────────────────────╮
│           RAFT Meta Group Information           │
├──────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ Leader │ Current │ Online │ Active │ Lag │
├──────┼────────┼─────────┼────────┼────────┼─────┤
│ n1m1 │        │ true    │ true   │ 0.14s  │ 0   │
│ n1m2 │        │ false   │ true   │ 0.14s  │ 11  │
│ n1m3 │        │ true    │ true   │ 0.14s  │ 0   │
│ n2m1 │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ n2m2 │        │ true    │ true   │ 0.14s  │ 0   │
╰──────┴────────┴─────────┴────────┴────────┴─────╯

This strange behavior is more likely to happen when doing docker compose up before on ./v1/docker-compose.yml and suddenly after on ./v0/docker-compose.yml.

Edit: sorry, I mismatched the order, now I have fixed

derekcollison commented 1 year ago

Thanks will take a look. Its seems the system thinks it has 2 meta leaders which is not desired of course.

codegangsta commented 1 year ago

@csuriano23 You mentioned that you have it fixed now, what ended up being the problem?

csuriano23 commented 1 year ago

@csuriano23 You mentioned that you have it fixed now, what ended up being the problem?

Sorry, I mean I've fixed the comment, I specified the wrong startup order to reproduce the issue

codegangsta commented 1 year ago

Gotcha! I'll look into this today as well and see if I can find the issue

comunidadio commented 1 year ago

any luck with this issue?

I think I'm getting a similar one when using nats-server in an capacity=3 auto scaling group on AWS. If an instance goes away, the two remaining instances will fail with "JetStream cluster no metadata leader" in loop whenever a replacement instance managed to connect (with routes to the 2 remaining servers).

ripienaar commented 1 year ago

I dont think its appropriate to autoscale a database tbh - and for the context of jetstream it basically is a database - why do you want to auto scale it? It seems inherently incompatible with the very concept.

comunidadio commented 1 year ago

@ripienaar The AWS "autoscaling" terminology here should be understood as "auto instance replacement". If a node fails for any reason (eg. underlying hardware failure), I would like the ASG to automatically launch a new node (that connects to the cluster as a replacement of the lost one).

I have changed "server_name" to use "availability zone" instead of "instance id" - and it seems the newly launched instance (of the same name of the failed instance) is able to catch up and serve queries correctly.

is that not a supported use case?

ripienaar commented 1 year ago

You can replace nodes as long as the over all up node count maintains a quorum and as long as new nodes coming in has the same server name. Then it will work. What you cant generally do is dynamically change the count of nodes or (easily) replace nodes with ones with new server names configuration set.

comunidadio commented 1 year ago

@ripienaar Thank you very much, that explains it, my earlier problem was indeed due to my usage of a new unique server_name for the replacement node instead of reusing the same server name.

Just to double-check, if the new replacement node of the same name starts with a new disk (storage of the failed instance lost) that's a supported use case as well and the new node shall restore the data as and if needed from the other nodes, correct?

ripienaar commented 1 year ago

Thats correct, it can take a good while and during that time this node is essentially not yet read and 100% available, so if you do this as a rolling maintenance you need to be very careful.