Open pcsegal opened 11 months ago
Had forgotten to include the configuration files in the reproducible example. Edited to include them.
What role is the arbiter playing? Why not connect cluster A directly to cluster B via GWs? Of extend cluster A with new nodes representing the new servers, and have them tagged, and use tags to move the assets?
Sorry for the delay.
True, this is not the simplest way to do a data migration. In fact, I ended up using a simpler strategy.
But, in any case, I posted this issue here because it looks like establishing a gateway connection shouldn't cause this observed behavior.
The role of the arbiter (not necessarily just in this specific migration scenario) would be for JetStream fault tolerance: if one cluster goes down, you still have 2 other clusters, and so JetStream is still available. If I understand correctly, this fault tolerance would not exist if we only connected 2 clusters via gateway.
Hi, I'm wondering if you had time to investigate this further.
Does the latest version still exhibit this behaviour ? There are some RC releases ready also for the next.
Yes, I tested it again with 2.10.17-RC6 and it still seems to exhibit the behavior of sometimes not showing any KV anymore after starting up the arbiter node.
This config where the arbiter node is the only one with the cluster shape defines is probably not really a supported configuration, jetstream wants a mostly static setup and wants to know the shape of things, you should list the full cluster everywhere rather than have this one arbitrer that configures up the cluster, it's not just not a supported approach I think.
It might work now and then and in some cases but its just not going to survive any outages or situations where the arbiter isnt around and the arbiter becomes a quite nasty SPOF.
Better use the software as designed.
I have observed a similar behaviour by only adding system account to jetstream k8s helm configuration and we lost the stream completely.
Adding such config to the helm charts
jetstream:
max_memory_store: << 1GB >>
accounts: {
SYS: {
users: [
{ user: admin, password: << $ADMIN_PASSWORD >> }
]
},
}
system_account: SYS
Caused our stream to be removed!
I assume it is not safe to add account (or system account) to a running server. Previously our Jetstream server had no accounts enabled.
@arkh-consensys it had a system account but it was named $SYS
which is the default one and then here it was renamed to SYS
.
Observed behavior
I'm trying to set up a gateway connection between two clusters, A and B, in order to migrate streams and KV buckets from cluster A to cluster B and decommission cluster B.
To create the gateway connections, I'm using an arbiter cluster with a single node.
I have added a reproducible example below, including NATS configuration files and a script,
test.sh
, that runs the example and reproduces the issue I'm seeing.The steps in
test.sh
do the following:nats server raft peer-remove
to decommission it.There is a problem that I see in step 3 after repeatedly running
test.sh
a few times. After the arbiter is up, the KV bucket sometimes disappears, and is no longer visible in any cluster. This happens randomly when re-runningtest.sh
. Note I used KV buckets here just as an example; this happens when I test it with pure streams as well.Here is an example of what the output looks like when this happens; this is printed after starting up the arbiter and waiting for a few seconds:
Expected behavior
I expect that, after the arbiter is up and the gateway connections are established, any stream that previously existed in cluster A or B continues existing in cluster A or B.
Server and client version
NATS server version: 2.10.5. NATS client version: 0.1.1
Host environment
Ubuntu 20.04.6.
Steps to reproduce
cluster-a-common.conf
cluster-a-0.conf
cluster-a-1.conf
cluster-a-2.conf
cluster-b-common.conf
cluster-b-0.conf
cluster-b-1.conf
cluster-b-2.conf
arbiter.conf
test.sh