Closed lesovsky closed 1 year ago
From what I understand reading #2579 this is not supposed to be the total cluster size - it's the jetstream meta group information, this is dynamically calculated based on known JS enabled servers and I am not entirely sure why the leader doesnt count itself - but that's typical for stream/consumer/meta group information, none of them include the leader info as part of cluster info, which I agree is very confusing.
I am not sure what the desired behavior is, but it's on purpose that this number doesnt always match total cluster size since we support mixed mode setups where only some in a cluster has JetStream enabled.
We could adjust to count the leader..
From the monitoring (observability) point of view, it is important to have metrics which briefly describes health of cluster. I supposed to use "cluster_size" for this purpose, because there is nothing else. It would be nice if in further versions will be added metrics (exposed by http-endpoints) which describe overall health of the cluster and total number of alive/dead nodes.
I found that such info could be found using natscli (mostly from server
sub-command) tool, and it would be nice if the same info could be obtained from http.
If the CLI can do it so can http. It gets exactly the same data.
The only difference is tbe CLi gather it from the entire fleet and then aggregates the data.
I don’t think we will ever be able to provide all data reliably from every node. That’s not how distributed systems works.
The only difference is tbe CLi gather it from the entire fleet and then aggregates the data.
Exactly, aggregates. Natscli provided output about all nodes in table format, and this is impossible to see in http-endpoints output.
I don’t think we will ever be able to provide all data reliably from every node
Anyway, all nodes works over RAFT, it means a particular node could output its current RAFT state. Let me know, if this information already exists in http endpoint (maybe meta_cluster
in /jsz, but I'm not sure)
Each node has its view in /jsz yes, but it’s not the entire world in there. There are many layers of raft and any given node only have a view of a subset.
We could adjust to count the leader..
It would be great.
Each node has its view in /jsz yes, but it’s not the entire world in there.
yes, I agreed - it is not entire world, but when requesting the state it is sufficient to get the current state right now, because in long-term measurements we will see a whole picture of how this "world" changes - the number of hosts are the same in the most of time, or number of hosts are changed (and cluster is unstable).
Or maybe expose information about the cluster has leader or not (like this is done in etcd with etcd_server_has_leader
metric).
You can get info about the core nats cluster in routesz - what is you are trying to find exactly?
what is you are trying to find exactly
I am looking for a simple metric which shows a number of nodes in the cluster, including the leader.
Specifically for JetStream or nats core also? And so do you have a super cluster?
No, I have no super clusters (no gateways, no leafs), quite simple setups with 3 or 5 nodes.
Specifically for JetStream or nats core also?
for Jetstream
I don’t think we have a single number for that. You would need to count array size or something like that atm
fwiw, thoug, in my setup this number seems correct, even on the leader:
[rip@p1-lon]% nats server req jsz --context system.lon |jq .data.meta_cluster.cluster_size
9
9
9
9
9
9
9
9
9
How do you define your routes, do you list all nodes or rely on some dynamic configuration and seeding of routes?
fwiw, thoug, in my setup this number seems correct, even on the leader:
Did you try to check after initial setup (as mentioned in the first message)?
I repeated the test again and get the same result (due to which I opened the issue)
$ for i in 1 2 3; do curl -s 192.168.122.1$i:8222/jsz |jq .meta_cluster.leader; done
"nats3"
"nats3"
"nats3"
$ for i in 1 2 3; do curl -s 192.168.122.1$i:8222/jsz |jq .meta_cluster.cluster_size; done
3
3
2
Why 2?
Used config:
# Ansible managed
# HTTP monitoring port
port: 4222
http: 8222
syslog: true
pid_file: /var/lib/nats-server/nats.pid
server_name: nats1
jetstream: true
authorization {
# default_permissions = {
# publish = "SANDBOX.*"
# subscribe = ["PUBLIC.>", "_INBOX.>"]
# }
user1 = {
publish = ">"
subscribe = ">"
}
users = [
{user: admin, password: "password"}
{user: user1, password: "password", permissions: $user1 }
]
}
accounts: {
SYS: {
users: [
{ user: admin, password: password }
]
},
}
system_account: SYS
cluster {
listen: 0.0.0.0:5222
name: test-cluster
# Authorization for route connections plaintext
authorization {
user: admin
password: password
}
routes: [
"nats-route://admin:password@nats2:5222"
"nats-route://admin:password@nats3:5222"
]
}
jetstream: {
store_dir: /var/lib/nats-server
max_memory_store: 1GB
max_file_store: 1GB
}
How do you define your routes, do you list all nodes or rely on some dynamic configuration and seeding of routes?
Please read carefully the first message I wrote, there are described steps to reproduce.
Please read carefully the first message I wrote, there are described steps to reproduce.
Clearly additional information is required since as demonstrated my environment does not exhibit this behavior, so additional questions or explorations are asked. As you can see your original question does NOT answer these questions.
You can found answers in the config attached above.
I thought maybe the issue tied with a wrong configuration (explained and fixed here), but unfortunately after adjusting config, behavior is not changed - cluster_size is smaller than real number of hosts in the cluster.
root@nats3:~# nats server list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Server Overview │
├───────┬──────────────┬───────────┬─────────┬─────┬───────┬──────┬────────┬─────┬────────┬─────┬──────┬────────┬───────────┤
│ Name │ Cluster │ IP │ Version │ JS │ Conns │ Subs │ Routes │ GWs │ Mem │ CPU │ Slow │ Uptime │ RTT │
├───────┼──────────────┼───────────┼─────────┼─────┼───────┼──────┼────────┼─────┼────────┼─────┼──────┼────────┼───────────┤
│ nats1 │ test-cluster │ 0.0.0.0 │ 2.6.3 │ yes │ 0 │ 132 │ 2 │ 0 │ 19 MiB │ 0.0 │ 0 │ 34m8s │ 978.959µs │
│ nats2 │ test-cluster │ 0.0.0.0 │ 2.6.3 │ yes │ 0 │ 132 │ 2 │ 0 │ 22 MiB │ 0.0 │ 0 │ 34m8s │ 961.162µs │
│ nats3 │ test-cluster │ 0.0.0.0 │ 2.6.3 │ yes │ 1 │ 132 │ 2 │ 0 │ 20 MiB │ 0.0 │ 0 │ 34m8s │ 871.655µs │
├───────┼──────────────┼───────────┼─────────┼─────┼───────┼──────┼────────┼─────┼────────┼─────┼──────┼────────┼───────────┤
│ │ 1 Clusters │ 3 Servers │ │ 3 │ 1 │ 396 │ │ │ 60 MiB │ │ 0 │ │ │
╰───────┴──────────────┴───────────┴─────────┴─────┴───────┴──────┴────────┴─────┴────────┴─────┴──────┴────────┴───────────╯
╭─────────────────────────────────────────────────────────────────────────────────╮
│ Cluster Overview │
├──────────────┬────────────┬───────────────────┬───────────────────┬─────────────┤
│ Cluster │ Node Count │ Outgoing Gateways │ Incoming Gateways │ Connections │
├──────────────┼────────────┼───────────────────┼───────────────────┼─────────────┤
│ test-cluster │ 3 │ 0 │ 0 │ 1 │
├──────────────┼────────────┼───────────────────┼───────────────────┼─────────────┤
│ │ 3 │ 0 │ 0 │ 1 │
╰──────────────┴────────────┴───────────────────┴───────────────────┴─────────────╯
root@nats3:~# curl -s 127.0.0.1:8222/jsz |jq .meta_cluster.cluster_size
2
Let me know if you need extra information or tests which have to be made.
If you run your servers in debug mode do you see any logs like Adjusting JetStream cluster
etc?
No, I found nothing similar to this. I collected per-server debug logs and put them to google drive, maybe you can find something useful there. In that setup, nats1 and nats2 shows cluster_size=2, and nats2 is the leader.
OK, I do see it's doing some dynamic peer gathering/sizing, which my clusters do not do and no doubt the bug is in there.
@derekcollison what circumstances would make it log 'intitial peers' and then gathering peer state from leader etc? My own clusters do not do this if I start them so I think this is doing some dynamic sizing? Maybe an accounting bug there. Could it be because routes do not list all servers?
@lesovsky please add all 3 servers to all route blocks on all servers.
nats1.out:Nov 09 09:16:55 nats1 nats-server[2871]: JetStream cluster checking for stable cluster name and peers
nats1.out:Nov 09 09:16:55 nats1 nats-server[2871]: JetStream cluster initial peers: [RztkeQup]
nats1.out:Nov 09 09:16:55 nats1 nats-server[2871]: RAFT [RztkeQup - _meta_] Update peers from leader to map[RztkeQup:0xc00015fe90 SRLRpmYS:0xc0003bbe60]
nats2.out:Nov 09 09:16:55 nats2 nats-server[2871]: JetStream cluster checking for stable cluster name and peers
nats2.out:Nov 09 09:16:55 nats2 nats-server[2871]: JetStream cluster initial peers: [SRLRpmYS]
nats3.out:Nov 09 09:16:55 nats3 nats-server[2870]: JetStream cluster checking for stable cluster name and peers
nats3.out:Nov 09 09:16:55 nats3 nats-server[2870]: JetStream cluster initial peers: [fvTBnQC7]
nats3.out:Nov 09 09:16:55 nats3 nats-server[2870]: RAFT [fvTBnQC7 - _meta_] Update peers from leader to map[RztkeQup:0xc0001677a0 SRLRpmYS:0xc000167000]
please add all 3 servers to all route blocks on all servers.
Defined all hosts in the routes (on all servers) and now cluster_size it the same everywhere and equal 3. Repeated the test and got the same result.
Hmm... I thought local server shouldn't be specified in the routes, but seems this is invalid?
It's valid to specify it, it will ignore it and not try to connect to it.
JS will take signal about the meta group size from the routes though, so that's why it went into trying to figure this out dynamically.
We might be able to handle the case where the server itself isnt in the route list as implied signal to JS Raft layer to include itself in the list? wdyt @derekcollison
Closing for now, feel free to re-open if needed.
Defect
v2.6.2 has a new
cluster_size
metric in/varz
and/jsz
endpoints. It seems it should show how many nodes are in the cluster.After initial setup, when comparing cluster_size values from all hosts, I found 1) the leader counts only replicas and doesn't count itself; or 2) sometimes cluster_size on all hosts is less than total number of hosts (at the same time no errors in the logs). This confuses a bit, it may seemed that cluster is degraded. After stopping the leader, a new leader is elected, starting old leader again and now all cluster_size values show the same numbers.
Versions of
nats-server
and affected client libraries used: 2.6.2, 2.6.3OS/Container environment:
Steps or code to reproduce the issue:
cluster_size
metric - values on leader less than on replicascluster_size
metric - all values are the sameExpected result:
All values should be equal after cluster initialization
Actual result:
cluster_size on leader is less than on replicas