Open AnonC0DER opened 1 year ago
@AnonC0DER have you figured out the problem?
I am seeing similar. It appears to impact only particular pod (from a 2-replica deployment). The main difference I see is that affected pod shows significant traffic on 3 different interfaces (cni0, ens3, flannel.1), while the remaining pod(s) show metrics for interface eth0. While I do not manage the underlying infrastructure I believe the networking configuration is the same for all nodes.
Summary:
I'm using cadvisor with Prometheus in multiple Kubernetes (k8s) clusters to monitor network traffic usage. I utilize the container_network_receive_bytes_total metric in a query to calculate the total network traffic usage. However, I'm encountering an unusual issue in one of the clusters.
Problem:
In one of my clusters, I have a non-production database that has been running smoothly for 20 days. However, the container_network_receive_bytes_total metric has shown a significant spike in usage, even though I am certain there is no increase in load. This issue is not isolated. I have encountered similar occurrences multiple times, and they all seem to happen in this particular cluster. I attempted numerous approaches to reproduce it, but I was unable to do so.
This is the query I'm using :
And this is the spike :
I believe the root cause of this issue lies within this cluster, but I am seeking guidance or clues on how to troubleshoot and resolve it.