Metrics Server Long term plan

serathius commented 3 years ago

In this issue I would like to propose a long term strategy for improving Metrics Server adaptation and collect ideas that would help us define a roadmap for next releases.

I hope that this should help to tackle larger problems and allow other contributors to take ownership over larger areas.

Opinions written here are my own.

Background

Main purpose of Metrics Server is resource utilization based autoscaling. It's the simplest autoscaling option in Kubernetes and usually the first that K8s users learn. Most k8s distribution install Metrics Server out of the box or provide it as an option. With it's popularity came major downside, default configuration that was popularized allowed only very basic autoscaling, thus allowing solutions alternative to Metrics Server to popularize.

In this document I would like to compare Metrics Server to two alternatives:

using adapters for resource metrics on example of k8s-prometheus-adapter
using custom metrics

Metrics Server vs k8s-prometheus-adapter

Prometheus is CNCF project that became a very popular for monitoring containers. By deploying an k8s-prometheus-adapter metrics collected by Prometheus Server can be integrated in K8s autoscaling pipelines allowing for both resource and custom metric autoscaling. For cluster administrators maintaining both solution brings additional overhead as each one needs to be upgraded, monitored for failures, tuned for performance and scaled with cluster. As Prometheus users already utilize it for monitoring of their clusters, they have nesesery expertise. Which means that using Metrics Server is redundant and costly. Even though targeted solution like Metrics Server has clear advantages over generic monitoring solution, until we catch up on our weak points, Prometheus users will stop using Metrics Server.

Metrics Server weak points:

Default slow autoscaling (1 minute resolution)
Hard to install and configure
No guidance on monitoring
No highly available configuration

Metrics Server advantages:

Small resource overhead
Simple to autoscale (can be fully automated)
Scales to up to 5`000 node clusters

Opportunity: By fixing Metrics Server weak points we can allow propose a zero maintenance solution that is more scalable when compared to Prometheus + k8s-prometheus-adapeter. As result Metrics Server would still make sense for heavy autoscaling autoscaling users.

Resource vs custom metric autoscaling

Due to popularity of untuned Metrics Server configuration there is a presumption that custom metrics autoscaling is always better to use. It is true that there are workloads that can benefit from it (autoscaling workers based on queue size), but resource based autoscaling should provide as good results in large majority of cases. There is a misconceptions is about how much work is needed to setup both solutions. Ensuring that application can reliably autoscale based on custom metrics requires not only building reliable monitoring solution but also requires tuning of applications. For example for web application a reliable autoscaling based on queries per second requires good understanding of how many concurrent requests can be handled by application and how much resources it will need for each type of request. As application evolve over time and regressions can be easily introduced this can lead to large inefficiencies in autoscaling.

Metrics Server weak points:

Default slow autoscaling (1 minute resolution)
Bad for short living pods

Metrics Server advantages:

Application agnostic
No full blown monitoring solution needed

Oportunity: By improving quality of autoscaling Metrics Server can provide much better user experience, thus allowing users to remain using it for majority of their workloads.

Strategy

Improve quality of out of the box configuration of Metrics Server allowing to much quality of autoscaling to alternatives, zero maintenance and easier to use solution for autoscaling.

Easy to use

When compared to any other Kubernetes application, Metrics Server has unreasonable number of dependencies that need to be configured for it to work. Requirements for Metrics Server are low level cluster configuration that developer wanting to try autoscaling is unable to change, resulting in frustration and unactionable support tickets on GitHub.

Ideas:

Work with K8s distributions to ensure better support of Metrics Server
Error messages in logs could inform of potential solutions or where to look for fix
Create a way that user can validate their configuration for compatibility
Create a K8s metrics conformance test using new Conformance profile to encourage Metrics Server support

Quality of Autoscaling

Resource based autoscaling should provide a good default option when compared to custom metrics autoscaling. This should be achieved by improving freshness and accuracy of metrics.

Ideas:

Reduce resource usage by migrating to Kubelet resource metrics endpoint (https://github.com/kubernetes-sigs/metrics-server/issues/559)
Optimize Metrics Server allowing increase of metric resolution to 15s
Improve accuracy of metrics by using cumulative metric (https://github.com/kubernetes-sigs/metrics-server/issues/567)
Improve autoscaling of short living pods

Scalable

Default Metrics Server resources should be adjusted to work in majority of clusters configurations. We should also reduce the friction for less popular configuration by providing better documentation or separate configuration.

Ideas:

Define how Metrics Server scalability should be measured (https://github.com/kubernetes-sigs/metrics-server/issues/557)
Propose set of scenarios to tests scalability (https://github.com/kubernetes-sigs/metrics-server/issues/710)
Prevent regressions by making scalability tests part of release process
Define targeted cluster size and density that should be supported out of the box.
Introduce configuration with resource auto-scaling
Metrics Server should work in clusters with lot of terminated jobs (https://github.com/kubernetes-sigs/metrics-server/issues/394)

Observable

Ensure that signals from Metrics Server (logs, metrics) can be easily understood by users. Improve out of the box experience of monitoring Metrics Server by making monitoring integration easier and providing good default dashboards and alerts.

Ideas:

Review and improve quality of logs and metrics
Document how Metrics Server should be monitored
Make monitoring integration easier (e.g. use prometheus annotations)
Create Grafana dashboard definition
Create Alertmanager alert definition

Reliable

Metrics Server is critical component in autoscaling pipeline, it's unavailability can lead to delayed autoscaling decisions. To make sure that users can reliably depend on autoscaling for their applications.

Ideas:

Create highly available configuration (https://github.com/kubernetes-sigs/metrics-server/issues/552)
Define Metrics Server SLI (https://github.com/kubernetes-sigs/metrics-server/issues/556)

Other ideas

Propose a plan for stabilizing Metrics API to GA

/cc @s-urbaniak @dgrisonnet

s-urbaniak commented 3 years ago

Agreed on the ideas expressed here :+1:

serathius commented 3 years ago

Good to hear that :P /cc @brancz for more feedback

dgrisonnet commented 3 years ago

I also very much agree with these ideas :+1:

serathius commented 3 years ago

Adding other SIG instrumentation leads for visibility and feedback /cc @logicalhan @dashpole @ehashman

ehashman commented 3 years ago

I think this is worthwhile. How do we plan to move forward on this? Do we have a process for collecting and acting on end user feedback?

serathius commented 3 years ago

I was planning to collect ideas and feedback from other SIG members and propose roadmap to give project more direction. I was also hoping to find owners for specific areas.

Idea for collecting feedback sounds really interesting. I would be interesting discussion during SIG meeting on how we can organize it for Metrics Server and other Instrumentation projects.

ehashman commented 3 years ago

I added this to the agenda for our next SIG meeting on the 12th.

brancz commented 3 years ago

I think the meeting on the 12th was cancelled as it's Thanksgiving in the US? Either way, I agree a SIG meeting would be good to discuss this.

ehashman commented 3 years ago

@brancz no, this week is on. The meeting after that is cancelled for US Thanksgiving which falls on Thu. Nov. 26.

brancz commented 3 years ago

Whops my bad. Today it is then :)

ehashman commented 3 years ago

/assign

We want to reach out to contribex to try to run a user survey.

logicalhan commented 3 years ago

Isn't the scraping resolution somewhat responsible for the ability of the metric-server to scale to 5k nodes? What do we expect increasing scraping resolution to 15s affect our supported cluster size of 5k nodes?

serathius commented 3 years ago

Metrics Server ability to scale to 5k nodes is based on it's linear resource usage that was verified by scale tests run by SIG scalability. Increasing scraping resolution should increase required resources, but should not break it's linearity. Increase in resources should be balanced by improvements in performance via planned switch to Prometheus endpoint.

Still there is a risk of lock contention, becoming a problem. So as first step we will go to 30s resolution, which is already used and tested in https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server

Before making jump to 15s we need to define scalability boundaries correctly so that we have a good definition of what it means that Metrics Server scales to some size. With this definition we will be able to decide if improvements in Metrics Server concurrency are needed.

ehashman commented 3 years ago

@serathius to run a user survey, ContribEx suggests that we put together a list of questions and then make a form either through SurveyMonkey or Google Forms. They can then assist us in getting the word out for completing the survey.

Slack thread: https://kubernetes.slack.com/archives/C1TU9EB9S/p1605218342051300

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

serathius commented 3 years ago

/remove-lifecycle stale /lifecycle frozen

stevehipwell commented 3 years ago

@serathius PR https://github.com/kubernetes-sigs/metrics-server/pull/670 would provide an easier way to install as well as make monitoring and HA possible optional components (Helm can be configurable where the simple yaml can't be).

serathius commented 3 years ago

Hey @stevehipwell Although Helm is pretty popular solution it's separate from Kubernetes tooling (kubectl apply -f) that we cannot push on everyone that deploys Metrics Server. There is a lot of application delivery methods other than Helm and we should leave users of them behind. Yaml manifests as rough they are, they are the common language that can be used by everyone and easly adapted to their tooling.

As for easier way to install and configure MS this point is more about making it easier to adjust MS configuration to a specific cluster. I was thinking there more about adding a tool that analyse the cluster config and generate the suggested MS configuration. In that area using Helm doesn't provide much more benefit over quality documentation.

stevehipwell commented 3 years ago

@serathius I'm not saying that a Helm chart would replace the yaml (although it technically could via helm template metrics-server/metrics-server | kubectl -n kube-system apply -f -), I'm saying that a Helm chart would help both discovery and customisation without shutting the door on any other deployment tool (Helm charts can always be templated out to plain yaml and you could even add a variable to remove the Helm specific annotations helm template --set cleanTemplate=true).

Regarding discoverability, Helm charts are registered at https://artifacthub.io/ where they can be searched for from any web browser (there is a Kubernetes org plan to standardise this). This also allows charts to be discovered directly from Helm via the helm search hub metrics-server command.

Regarding customisation, nothing is a replacement for good docs so I think that fall outside this discussion. What a Helm chart brings is twofold, an idiomatic way to configure common components (e.g. it would be idiomatic to enable a Prometheus Operator service monitor via serviceMonitor.enabled: true) and the ability to template directly or at a higher level (e.g. hostNetwork: true could be automatically set if cloud: aws and aws.secondaryNetwork: true). The idiomatic argument could be made even stronger if the whole kubernetes-sigs group defined a shared baseline.

serathius commented 3 years ago

/cc @yangjunmyfm192085

dgrisonnet commented 1 year ago

/cc @olivierlemasle

k8s-triage-robot commented 7 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

kubernetes-sigs / metrics-server