Open b2broker-yperfilov opened 3 months ago
When something like that happens, we request the developer capture some profiles for us, specifically cpu, mem (heap), and stacksz / goroutines.
@derekcollison here are screenshot of some metrics. I went through many memory metrics, and all of the looks quite stable
The stream info shows the only limit you have in place, which is age, appearing to work correctly. What do you think is not working correctly?
Also do you properly set GOMEMLIMIT?
@derekcollison we do not have GOMEMLIMIT set. At the same time, issue is not with memory of the pod, issue with disk storage.
We have a replication on 3 nodes for this stream, that means that message should be copied to 3 nodes, and at any time the same amount of space should be occupied on each node (assuming all other stream also having replicas factor 3). However, one of the nodes didn't follow tis rule, as can be seen from the initial message, resulting in disk leackage.
Can you share a du -sh
from the store directory for the one that has increased disk usage?
@derekcollison
Now it is 1.3G
. Another node is 102.0M, another is 97.4M
Observed behavior
We are using Limit policy with maximum age of 15 minutes. However, 1 of 3 nodes didn't cleanup storage in time, resulting in storage filled and crash.
On the screen below, you can see the storage usage stats of 3 nodes. Notice that blue one has much larger storage usage compared to to red and yellow nodes.
The screenshot below is from NATS dashboard, you can see that stream message count also rose significantly
Configuration of the stream provided on the screen below. Stream was recreated during attempt to fix the issue, but it has exactly the same settings. Notice Max age here of 15 minutes, as well as typical bytes size and message count.
In logs, there were errors (repeated several time):
Please let me know if you need any additional details
Expected behavior
Limit policy cleaning as expected
Server and client version
Server 2.10.18
Host environment
K8s
Steps to reproduce
not clear