kubernetes-sigs / metrics-server

Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Apache License 2.0
5.63k stars 1.85k forks source link

How to shorten the collecting interval(resolution) ? #1483

Open Alex-Kil opened 2 months ago

Alex-Kil commented 2 months ago

Hi,

FAQ.md says that minumum metric-resolution calculated by Kubelet is 15s. And Metrics server source code is like below:

func (o Options) validate() []error { errors := []error{} if o.MetricResolution < 10time.Second { errors = append(errors, fmt.Errorf("metric-resolution should be a time duration at least 10s, but value %v provided", o.MetricResolution)) } if o.MetricResolution*9/10 < o.KubeletClient.KubeletRequestTimeout { errors = append(errors, fmt.Errorf("metric-resolution should be larger than kubelet-request-timeout, but metric-resolution value %v kubelet-request-timeout value %v provided", o.MetricResolution, o.KubeletClient.KubeletRequestTimeout)) } return errors }

I want to shorten the resolution interval so that I can catch the min/max of CPU/Memory usage per pod because the resource usages are fluctuating very fast so 15s resolution maybe miss the peak point. Do I have to collect /metrics/resources directly from endpoint ? Or is there any other solution ?

Thanks, Alex

logicalhan commented 2 months ago

/kind support /triage accepted /assign @dgrisonnet

Alex-Kil commented 2 months ago

Hi Team, Any update on this ?

dgrisonnet commented 2 months ago

It is not recommended to go below 15s as this would put too much pressure on kubelet who's metrics collection doesn't scale well. There is a longstanding issue about that in Kubernetes https://github.com/kubernetes/kubernetes/issues/104459 but we haven't made much progress on that and it doesn't look like there will be any in the short term.

One project that was written to work around that is https://github.com/kubernetes-sigs/usage-metrics-collector/, but it doesn't support cgroups v2 today.