kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.36k stars 2k forks source link

kube-state-metrics consuming too much memory #257

Closed jac-stripe closed 5 years ago

jac-stripe commented 7 years ago

kube-state-metrics is using >400mb of RAM. It is also very slow when I query /metrics. The kubernetes cluster has 2700 job objects. It seems surprising that this would consume 400mb of RAM for metrics aggregation. Below is a pprof top trace. This is running the latest git revision (d316c013fae8965bfb75bafda9453ca2ef54c48f)

(pprof) top
Showing nodes accounting for 526.72MB, 86.90% of 606.14MB total
Dropped 148 nodes (cum <= 3.03MB)
Showing top 10 nodes out of 110
      flat  flat%   sum%        cum   cum%
  195.01MB 32.17% 32.17%   202.01MB 33.33%  github.com/prometheus/client_golang/prometheus.makeLabelPairs
  101.26MB 16.71% 48.88%   148.26MB 24.46%  github.com/prometheus/client_golang/prometheus.(*Registry).Gather
   74.28MB 12.26% 61.13%    74.81MB 12.34%  k8s.io/kube-state-metrics/collectors.RegisterJobCollector.func1                                                                 47MB  7.75% 68.89%       47MB  7.75%  github.com/prometheus/client_golang/prometheus.populateMetric
   27.60MB  4.55% 73.44%    30.60MB  5.05%  k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceVolume
   23.01MB  3.80% 77.24%    23.01MB  3.80%  runtime.rawstringtmp
   18.97MB  3.13% 80.37%    19.55MB  3.22%  github.com/golang/protobuf/proto.(*Buffer).EncodeStringBytes
   15.50MB  2.56% 82.92%   217.51MB 35.88%  github.com/prometheus/client_golang/prometheus.NewConstMetric
   13.50MB  2.23% 85.15%    14.02MB  2.31%  runtime.mapassign
   10.58MB  1.74% 86.90%    12.71MB  2.10%  compress/flate.NewWriter
discordianfish commented 6 years ago

@gades Just double checked over here and since I've upgraded to 1.3.1 my memory usage is <400MB and the scrape duration <2s, usually <0.5s.

DewaldV commented 6 years ago

On the topic of Memory consumption, we've been battling with runaway memory consumption of kube-state-metrics on one of our clusters. This particular cluster has around 3730 running pods and 28160 total objects (quick line count of get all --all-namespaces) across 44 nodes.

We've been running a single instance of kube-state-metrics in the kube-system namespace with the following collectors setup:

collectors=cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,pods,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,resourcequotas,services,statefulsets

This setup resulted in a kube state metrics that could be run stabily with 5-6 cpu and 8-10GB of RAM.

One of our teams started an additional 900 pods which resulted in us being unable to stablize kube-state-metrics even with 30GB+ memory, it just continued being OOMKilled.

We broke our kube-state-metrics into an instance per namespace and are now running around 33 instances of kube-state-metrics each watching a single namespace. The resulting config brought the resource usage down to 0.5 CPU and around 1.5GB of RAM for all 33 instances in total monitoring the same cluster.

andyxning commented 6 years ago

The resulting config brought the resource usage down to 0.5 CPU and around 1.5GB of RAM for all 33 instances in total monitoring the same cluster.

This is an interesting result compared with the single kube-state-metrics scenario. Seems that kube-state-metrics can not handle big objects with one instance or something like memory leak.

@DewaldV

DewaldV commented 6 years ago

@andyxning We are running the 1.3.1 image from quay.io

Can give the latest master branch a try. I'll run an additional instance of kube-state metrics from latest without letting prometheus scrape from it (to avoid duplicate metrics) and see how it does. I'll also pull some graphs and numbers to show the memory/cpu usage for the different setups to compare.

andyxning commented 6 years ago

@DewaldV That's really cool!

brancz commented 6 years ago

Note that scraping will make a difference as producing the /metrics output is significant with those numbers of objects.

andyxning commented 6 years ago

@DewaldV Another non-prod Prometheus is needed to collect the metrics or we need to make request to /metrics endpoint.

DewaldV commented 6 years ago

@andyxning Will do, I'll spin up another Prometheus as well. I'll try get these numbers later today for you.

ehashman commented 6 years ago

Just wanted to chime in that I have also encountered the same issue. We are scraping KSM 1.2.0 with Prometheus 2.x on Kubernetes 1.8.7.

We have two clusters: one with ~150 nodes and one with ~200 nodes. On the cluster with ~150 nodes, KSM reports (I'm only including resources with >500 count for brevity):

Response size is 920k lines and 101M.

I set KSM's memory limit to 4GB but it still frequently exceeds this (and gets OOMKilled). It takes about 10 hours before it hits 4GB of memory usage.

I can see it spikes to 2.5 CPU cores used pretty often as well.

On our cluster with ~200 nodes, KSM frequently will time out on requests (we are scraping it every 30s). It uses even more resources there.

I'd like to upgrade to 1.3.1 but I've been running into certificate validation and authentication/RBAC issues... unclear if that will help with the resource utilization problem. I'd like to look into turning off or dropping any of the timeseries we are not using (e.g. jobs) as well as tuning the cluster's garbage collection, but I feel like that's not solving the underlying problem.

At minimum, can we upgrade the documentation guidelines on resource usage? I was definitely confused when the docs say to allocate 300MB of RAM and 0.150 CPU cores where in reality I need >3GB RAM and 3 cores.

andyxning commented 6 years ago

@ehashman Thanks for the feedback.

At minimum, can we upgrade the documentation guidelines on resource usage? I was definitely confused when the docs say to allocate 300MB of RAM and 0.150 CPU cores where in reality I need >3GB RAM and 3 cores.

The guidelines for setting resource usage for KSM is somewhat according to a benchmark which may not cover all reality resource usage when the cluster is of about 150~200 nodes. But the resource usage guidelines are not so easy to give out as the cluster load is different.

The guidelines should be updated.

brancz commented 6 years ago

@andyxning I felt like we had a PR pending that adds a note that kube-state-metrics actually scales with the number of objects as opposed to number of nodes, but it gives some indication.

@ehashman you can already turn off collectors using the --collectors flag (or rather whitelist the ones you want to use). kube-state-metrics will offload the lack of resources (cpu/memory) onto the other resource, meaning when there is cpu pressure memory consumption will grow. I recommend trying to run kube-state-metrics without any resource limits or requests and see what it ends up using. We definitely want to run new scalability tests, we will do this along with #498.

andyxning commented 6 years ago

I felt like we had a PR pending that adds a note that kube-state-metrics actually scales with the number of objects as opposed to number of nodes, but it gives some indication.

This has been merged in #490 as part of describing the pod nanny usage.

mrsiano commented 6 years ago

@brancz @smarterclayton the protobuf already implemented ?! we have some benchmark results to visible how much better it is?

mrsiano commented 6 years ago

@smarterclayton @brancz another thing we might faced this one as well https://bugzilla.redhat.com/show_bug.cgi?id=1426009

andyxning commented 6 years ago

the protobuf already implemented ?! we have some benchmark results to visible how much better it is?

@mrsiano Yes, pb support has been added in https://github.com/kubernetes/kube-state-metrics/pull/475. It is available after 1.4.0. Could you please give it a try and do some benchmark.

ehashman commented 5 years ago

As a follow-up to my earlier comment, just wanted to share the results of my KSM upgrade from 1.4.0 to 1.5.0-beta.0 in one of our aforementioned clusters with 200 nodes:

ksm5

As you can see, CPU utilization and memory usage have dropped dramatically. Network utilization has increased as I am no longer gzipping responses. With this upgrade, the documented benchmarks for resource utilization appear to be accurate and wouldn't need to be updated :tada:

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

ehashman commented 5 years ago

/close

k8s-ci-robot commented 5 years ago

@ehashman: Closing this issue.

In response to [this](https://github.com/kubernetes/kube-state-metrics/issues/257#issuecomment-485419374): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.