Open alekseipalshin opened 6 months ago
This is a pretty old NATS server version. I would start with upgrading to latest 2.10.14 to see it that solves the problem. We did put a lot of effort into JetStream between those two versions.
Thank you, I'll try.
When we updated the version from 2.8.4 to 2.9.16, the performance became much better.
Although the behavior seems strange, we have many other streams and there are no problems with them.
Hi! I'm a part of the team. We upgraded the cluster to version 2.10.16 but are still having problems.
Please can you provide some context? At a minimum updated stream info
and consumer info
, updated statistics/graphs, any other relevant configuration changes, etc.
We have detected high disk write load on some servers' NATS clusters. Restarting helps resolve the issue
The cluster with 3 nodes v2.10.16. It has 8 jet streams, and all streams have the same options
Replicas: 3
Storage: File
Options:
Retention: Limits
Acknowledgments: true
Discard Policy: Old
Duplicate Window: 2m0s
Direct Get: true
Allows Msg Delete: true
Allows Purge: true
Allows Rollups: false
Limits:
Maximum Messages: unlimited
Maximum Per Subject: unlimited
Maximum Bytes: unlimited
Maximum Age: 15m0s
Maximum Message Size: unlimited
Maximum Consumers: unlimited
Cluster Information:
Name: nats-v2-10
Leader: nats-v2-10-1
Replica: nats-v2-10-0, current, seen 57ms ago
Replica: nats-v2-10-2, current, seen 56ms ago
State:
Messages: 23,913
Bytes: 11 MiB
First Sequence: 239,457,256 @ 2024-10-18 09:37:24 UTC
Last Sequence: 239,481,168 @ 2024-10-18 09:52:24 UTC
Active Consumers: 11
Number of Subjects: 10
Configuration:
Name: bid_did_active_v1_ru_prod
Pull Mode: true
Filter Subject: bids.bid_did_active.v1.ru.prod
Deliver Policy: All
Ack Policy: Explicit
Ack Wait: 6.00s
Replay Policy: Instant
Maximum Deliveries: 5
Max Waiting Pulls: 512
Cluster Information:
Name: nats-v2-10
Leader: nats-v2-10-1
Replica: nats-v2-10-0, current, seen 23ms ago
Replica: nats-v2-10-2, current, seen 23ms ago
State:
Last Delivered Message: Consumer sequence: 44,196,707 Stream sequence: 239,491,181 Last delivery: 38ms ago
Acknowledgment Floor: Consumer sequence: 44,196,707 Stream sequence: 239,491,181 Last Ack: 24ms ago
Outstanding Acks: 0
Redelivered Messages: 0
Unprocessed Messages: 0
Waiting Pulls: 2 of maximum 512
Before restart > 100Mb/s and after < 2Mb/s
Memory
Load profile doesn't change
After 12 days. We see a slow increase in disk writes per day
Load profile
@neilalexander do you need more information?
Observed behavior
Hello.
We have detected high disk write load on some servers of the NATS cluster. (up to 100 MB/s)
Restarting the NATS server helps reduce the disk load. We had no ideas what could have caused it, so we conducted an analysis using Strace:
nats stream info
nats consumer info
Expected behavior
Disk write throughput is ~1 MB/s like after restart
Server and client version
nats-server:2.9.16
github.com/nats-io/nats.go v1.31.0
Host environment
Kubernetes v1.26.12-eks
Installed from helm chart:
Steps to reproduce
No response