Open liyancoding opened 8 months ago
Make sure you are running the latest, for 2.9.x it's 2.9.24, and for 2.10.x it's 2.10.7.
What's the reason?
We have a tremendous amount of users and a very fast growing number of large customers. It's difficult for our team to support or triage issues for the OSS community if they are not on the latest version.
If you are a paying customer, that is different.
Any updates?
Version is v2.9.23, not the latest. However, the NATS memory keeps increasing and does not decrease. The stream is enabled. It's a serious problem
Are you a Synadia customer?
What is Synadia Customer?
Synadia is the company behind the NATS.io ecosystem. We have customers and we prioritize them in terms of GH issues etc.
We're not Synadia customers.
ok, for our OSS users we ask you upgrade to the latest server and clients. Server is 2.10.9 now. If the issue persists we would be happy to dig in and work with you on a solution.
ok, thanks.
@derekcollison Seeing a similar issue here, There are messages on the stream, but when trying to consume or create new consumers for the same stream I get this issue.
? Select a Consumer xxx-simple-pipeline-out-0
nats: error: could not load Consumer xxx-simple-pipeline-out-0 > xxx-simple-pipeline-out-0: JetStream system temporarily unavailable (10008)
The memory usage is also very high.
Using
[7] 2024/08/06 23:44:37.076574 [INF] Starting nats-server
[7] 2024/08/06 23:44:37.076714 [INF] Version: 2.10.18
[7] 2024/08/06 23:44:37.076718 [INF] Git: [57d23ac]
Subjects: xxx-simple-pipeline-out-0
Replicas: 3
Storage: File
Options:
Retention: Limits
Acknowledgments: true
Discard Policy: Old
Duplicate Window: 1m0s
Allows Msg Delete: true
Allows Purge: true
Allows Rollups: false
Limits:
Maximum Messages: 100,000
Maximum Per Subject: unlimited
Maximum Bytes: unlimited
Maximum Age: 3d0h0m0s
Maximum Message Size: unlimited
Maximum Consumers: unlimited
Cluster Information:
Name: default
Leader:
Replica: isbsvc-default-js-0, outdated, seen 18m6s ago, 10,310 operations behind
Replica: isbsvc-default-js-2, outdated, seen 26m47s ago, 10,310 operations behind
Replica: isbsvc-default-js-4, outdated, seen 17m47s ago
State:
Messages: 10,308
Bytes: 13 GiB
First Sequence: 1 @ 2024-08-06 23:47:07 UTC
Last Sequence: 10,308 @ 2024-08-06 23:55:47 UTC
Active Consumers: 1
Number of Subjects: 1
The NRG layer looks like its struggling, could be a network issue or a mis-configuration of the NATS system.
I believe I was hitting some disk limit in this case. I have increased the limits to try and replicate if this occurs again. On another note, when the disk available vs the total_storage allowed for jetstream is mismatched. What should be the expected behaviour?
If a server encounters a quota issue with the underlying store it will log it and shutdown jetstream for that server. You will see that clearly in the logs..
The tail logs showing up in the server were as follows @derekcollison
[7] 2024/08/07 00:33:14.149953 [DBG] RAFT [JLAxTIGX - S-R3F-66I5xvhX] Sending out voteRequest {term:351 lastTerm:1 lastIndex:245 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:14.149992 [WRN] JetStream cluster stream 'js > KV_xxxx-simple-pipeline-in_SOURCE_OT' has NO quorum, stalled
[7] 2024/08/07 00:33:14.211314 [DBG] 10.214.110.10:6222 - rid:12 - Router Ping Timer
[7] 2024/08/07 00:33:14.316765 [DBG] 10.214.83.38:57342 - rid:11 - Router Ping Timer
[7] 2024/08/07 00:33:14.327738 [DBG] RAFT [JLAxTIGX - S-R3F-4E4Qm94d] Sending out voteRequest {term:347 lastTerm:2 lastIndex:10181 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:14.419642 [DBG] 10.214.110.10:6222 - rid:14 - Router Ping Timer
[7] 2024/08/07 00:33:14.427824 [DBG] 10.214.110.10:6222 - rid:13 - Router Ping Timer
[7] 2024/08/07 00:33:15.179397 [DBG] 10.214.76.98:6222 - rid:17 - Router Ping Timer
[7] 2024/08/07 00:33:15.195393 [DBG] 10.214.76.98:6222 - rid:16 - Router Ping Timer
[7] 2024/08/07 00:33:15.195410 [DBG] 10.214.110.10:6222 - rid:18 - Router Ping Timer
[7] 2024/08/07 00:33:15.200489 [DBG] 10.214.64.115:6222 - rid:19 - Router Ping Timer
[7] 2024/08/07 00:33:15.260139 [DBG] 10.214.64.115:6222 - rid:21 - Router Ping Timer
[7] 2024/08/07 00:33:15.269118 [DBG] 10.214.64.115:6222 - rid:23 - Router Ping Timer
[7] 2024/08/07 00:33:15.327054 [DBG] 10.214.64.115:6222 - rid:20 - Router Ping Timer
[7] 2024/08/07 00:33:15.364602 [DBG] 10.214.76.98:6222 - rid:22 - Router Ping Timer
[7] 2024/08/07 00:33:15.415052 [DBG] 10.214.76.98:6222 - rid:24 - Router Ping Timer
[7] 2024/08/07 00:33:17.817384 [DBG] RAFT [JLAxTIGX - C-R3F-LwsCNRjm] Sending out voteRequest {term:339 lastTerm:2 lastIndex:13861 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:21.063380 [DBG] RAFT [JLAxTIGX - S-R3F-66I5xvhX] Sending out voteRequest {term:352 lastTerm:1 lastIndex:245 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:22.168353 [DBG] RAFT [JLAxTIGX - S-R3F-4E4Qm94d] Sending out voteRequest {term:348 lastTerm:2 lastIndex:10181 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:26.677069 [WRN] JetStream cluster consumer 'js > xxxx-simple-pipeline-out-2 > xxxx-simple-pipeline-out-2' has NO quorum, stalled.
[7] 2024/08/07 00:33:26.677145 [DBG] RAFT [JLAxTIGX - C-R3F-LwsCNRjm] Sending out voteRequest {term:340 lastTerm:2 lastIndex:13861 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:26.927723 [DBG] RAFT [JLAxTIGX - S-R3F-4E4Qm94d] Sending out voteRequest {term:349 lastTerm:2 lastIndex:10181 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:26.927763 [WRN] JetStream cluster stream 'js > xxxx-simple-pipeline-out-2' has NO quorum, stalled
[7] 2024/08/07 00:33:29.931302 [DBG] RAFT [JLAxTIGX - S-R3F-66I5xvhX] Sending out voteRequest {term:353 lastTerm:1 lastIndex:245 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:30.932134 [INF] JetStream cluster new consumer leader for 'js > KV_xxxx-simple-pipeline-in_SOURCE_OT > ohrI01x9'
[7] 2024/08/07 00:33:30.935159 [DBG] JETSTREAM - JetStream connection closed: Client Closed
[7] 2024/08/07 00:33:30.935181 [DBG] JETSTREAM - JetStream connection closed: Client Closed
[7] 2024/08/07 00:33:30.936133 [DBG] RAFT [JLAxTIGX - _meta_] Installing snapshot of 11619 bytes
[7] 2024/08/07 00:33:33.552226 [DBG] RAFT [JLAxTIGX - C-R3F-LwsCNRjm] Sending out voteRequest {term:341 lastTerm:2 lastIndex:13861 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:33.552377 [DBG] RAFT [JLAxTIGX - S-R3F-4E4Qm94d] Sending out voteRequest {term:350 lastTerm:2 lastIndex:10181 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:35.777956 [DBG] RAFT [JLAxTIGX - S-R3F-66I5xvhX] Sending out voteRequest {term:354 lastTerm:1 lastIndex:245 candidate:JLAxTIGX reply:}
[7] 2024/08/07 00:33:35.777997 [WRN] JetStream cluster stream 'js > KV_xxxx-simple-pipeline-in_SOURCE_OT' has NO quorum, stalled
Yes no one is responding to the votes.. So either system mis-configured or some servers have shutdown the jetstream subsystem..
Observed behavior
Expected behavior
The error persists after the cluster is restarted. The final solution is to deploy nats to other nodes. I tried it without problem. Finally, this problem is solved by deploying to the original node. However, the reason why the stream cannot be used temporarily is not found
Server and client version
version is nats-server: v2.9.20
Host environment
No response
Steps to reproduce
No response