Closed jac-stripe closed 5 years ago
@gades Just double checked over here and since I've upgraded to 1.3.1 my memory usage is <400MB and the scrape duration <2s, usually <0.5s.
On the topic of Memory consumption, we've been battling with runaway memory consumption of kube-state-metrics on one of our clusters. This particular cluster has around 3730 running pods and 28160 total objects (quick line count of get all --all-namespaces) across 44 nodes.
We've been running a single instance of kube-state-metrics in the kube-system namespace with the following collectors setup:
collectors=cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,pods,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,resourcequotas,services,statefulsets
This setup resulted in a kube state metrics that could be run stabily with 5-6 cpu and 8-10GB of RAM.
One of our teams started an additional 900 pods which resulted in us being unable to stablize kube-state-metrics even with 30GB+ memory, it just continued being OOMKilled.
We broke our kube-state-metrics into an instance per namespace and are now running around 33 instances of kube-state-metrics each watching a single namespace. The resulting config brought the resource usage down to 0.5 CPU and around 1.5GB of RAM for all 33 instances in total monitoring the same cluster.
The resulting config brought the resource usage down to 0.5 CPU and around 1.5GB of RAM for all 33 instances in total monitoring the same cluster.
This is an interesting result compared with the single kube-state-metrics scenario. Seems that kube-state-metrics can not handle big objects with one instance or something like memory leak.
@DewaldV
@andyxning We are running the 1.3.1 image from quay.io
Can give the latest master branch a try. I'll run an additional instance of kube-state metrics from latest without letting prometheus scrape from it (to avoid duplicate metrics) and see how it does. I'll also pull some graphs and numbers to show the memory/cpu usage for the different setups to compare.
@DewaldV That's really cool!
Note that scraping will make a difference as producing the /metrics
output is significant with those numbers of objects.
@DewaldV Another non-prod Prometheus is needed to collect the metrics or we need to make request to /metrics
endpoint.
@andyxning Will do, I'll spin up another Prometheus as well. I'll try get these numbers later today for you.
Just wanted to chime in that I have also encountered the same issue. We are scraping KSM 1.2.0 with Prometheus 2.x on Kubernetes 1.8.7.
We have two clusters: one with ~150 nodes and one with ~200 nodes. On the cluster with ~150 nodes, KSM reports (I'm only including resources with >500 count for brevity):
Response size is 920k lines and 101M.
I set KSM's memory limit to 4GB but it still frequently exceeds this (and gets OOMKilled). It takes about 10 hours before it hits 4GB of memory usage.
I can see it spikes to 2.5 CPU cores used pretty often as well.
On our cluster with ~200 nodes, KSM frequently will time out on requests (we are scraping it every 30s). It uses even more resources there.
I'd like to upgrade to 1.3.1 but I've been running into certificate validation and authentication/RBAC issues... unclear if that will help with the resource utilization problem. I'd like to look into turning off or dropping any of the timeseries we are not using (e.g. jobs) as well as tuning the cluster's garbage collection, but I feel like that's not solving the underlying problem.
At minimum, can we upgrade the documentation guidelines on resource usage? I was definitely confused when the docs say to allocate 300MB of RAM and 0.150 CPU cores where in reality I need >3GB RAM and 3 cores.
@ehashman Thanks for the feedback.
At minimum, can we upgrade the documentation guidelines on resource usage? I was definitely confused when the docs say to allocate 300MB of RAM and 0.150 CPU cores where in reality I need >3GB RAM and 3 cores.
The guidelines for setting resource usage for KSM is somewhat according to a benchmark which may not cover all reality resource usage when the cluster is of about 150~200 nodes. But the resource usage guidelines are not so easy to give out as the cluster load is different.
The guidelines should be updated.
@andyxning I felt like we had a PR pending that adds a note that kube-state-metrics actually scales with the number of objects as opposed to number of nodes, but it gives some indication.
@ehashman you can already turn off collectors using the --collectors
flag (or rather whitelist the ones you want to use). kube-state-metrics will offload the lack of resources (cpu/memory) onto the other resource, meaning when there is cpu pressure memory consumption will grow. I recommend trying to run kube-state-metrics without any resource limits or requests and see what it ends up using. We definitely want to run new scalability tests, we will do this along with #498.
I felt like we had a PR pending that adds a note that kube-state-metrics actually scales with the number of objects as opposed to number of nodes, but it gives some indication.
This has been merged in #490 as part of describing the pod nanny usage.
@brancz @smarterclayton the protobuf already implemented ?! we have some benchmark results to visible how much better it is?
@smarterclayton @brancz another thing we might faced this one as well https://bugzilla.redhat.com/show_bug.cgi?id=1426009
the protobuf already implemented ?! we have some benchmark results to visible how much better it is?
@mrsiano Yes, pb support has been added in https://github.com/kubernetes/kube-state-metrics/pull/475. It is available after 1.4.0. Could you please give it a try and do some benchmark.
As a follow-up to my earlier comment, just wanted to share the results of my KSM upgrade from 1.4.0 to 1.5.0-beta.0 in one of our aforementioned clusters with 200 nodes:
As you can see, CPU utilization and memory usage have dropped dramatically. Network utilization has increased as I am no longer gzipping responses. With this upgrade, the documented benchmarks for resource utilization appear to be accurate and wouldn't need to be updated :tada:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
/close
@ehashman: Closing this issue.
kube-state-metrics is using >400mb of RAM. It is also very slow when I query
/metrics
. The kubernetes cluster has 2700 job objects. It seems surprising that this would consume 400mb of RAM for metrics aggregation. Below is a pprof top trace. This is running the latest git revision (d316c013fae8965bfb75bafda9453ca2ef54c48f)