nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.27k stars 1.37k forks source link

Healthz endpoint should fail if we encounter raft.outOfResources() #4847

Open joeledstrom opened 7 months ago

joeledstrom commented 7 months ago

Proposed change

When the server (debug logging enabled) is restarted with low diskspace, we get these messages:

nats-0 nats [1] 2023/12/04 12:57:55.395683 [DBG] RAFT [S1Nunr6R - S-R3F-njKShA5R] Not switching to candidate, no resources
nats-0 nats [1] 2023/12/04 12:57:55.398691 [DBG] RAFT [S1Nunr6R - C-R3F-MAtRDrOk] AppendEntry not processing inbound, no resources

nats-1 and nats-2 are still working fine here, but nats stream report reports replicas for nats-0 like this

nats-0!, nats-1, nats-2*

I think that it would be great if the server could report this issue through the healthz endpoint. So that this issue could detected without looking at the log or using the natscli.

Note there is still actually space left, but this issue can be fixed by allocating even more diskspace, and restarting nats.

This is what is looks like when nats starts up

nats-0 nats [1] 2023/12/04 12:57:06.981066 [INF] ---------------- JETSTREAM ----------------
nats-0 nats [1] 2023/12/04 12:57:06.981074 [INF]   Max Memory:      2.88 GB
nats-0 nats [1] 2023/12/04 12:57:06.981081 [INF]   Max Storage:     3.58 GB
nats-0 nats [1] 2023/12/04 12:57:06.981086 [INF]   Store Directory: "/var/datastore/jetstream"
nats-0 nats [1] 2023/12/04 12:57:06.981092 [INF] -------------------------------------------
nats-0 nats [1] 2023/12/04 12:57:06.981171 [DBG]   Exports:
nats-0 nats [1] 2023/12/04 12:57:06.981187 [DBG]      $JS.API.>
nats-0 nats [1] 2023/12/04 12:57:06.981214 [DBG] Enabled JetStream for account "$G"
nats-0 nats [1] 2023/12/04 12:57:06.981223 [DBG]   Max Memory:      -1 B
nats-0 nats [1] 2023/12/04 12:57:06.981229 [DBG]   Max Storage:     -1 B
nats-0 nats [1] 2023/12/04 12:57:06.981729 [DBG] Recovering JetStream state for account "$G"

So somehow it determines that the low storage left on the device 3.58GB is not enough to enable jetstream storage fully (sets it to -1 B).

Use case

To be able to detect that a server is running low on diskspace through healthcheck endpoint.

Contribution

No response

derekcollison commented 7 months ago

Do you set the max storage in the server config?