kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.84k stars 3.88k forks source link

What would we like from a VPA Benchmark? #5493

Closed lallydd closed 1 month ago

lallydd commented 1 year ago

To understand the capacity of a VPA deployment, in particular the recommender, what kind of performance measurements would we like from a benchmark?

Which component are you using?: This will likely end up as an end-to-end test running off of kind on a single box.

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

lallydd commented 1 year ago

First thoughts:

jbartosik commented 1 year ago

Sounds good to me.

I think that first thing likely to slow down is processing VPA objects in recommender.

As we get more and more costly to process VPA objects in a cluster (see VPA recommenders RunOnce):

For latency of admission controller I'm less worried but it'd be good to check it too.

voelzmo commented 1 year ago

There's also recommendation_latency_seconds which is important to understand how well the recommender can deal with many objects being created at the same time.

@lallydd is this also about adjusting the history storage of the recommender to something external? Or do your external metric sources currently only provide real-time data? If we're also taking history storage into account, it would be interesting how long it takes to backfill the historic data, but AFAIK we don't have a metric for this yet.

One thing to keep in mind for these tests is that they rely on a big-enough sizing for the client-side rate limits towards kube-apiserver (see discussions in https://github.com/kubernetes/autoscaler/issues/4498 where I tried to re-configure our recommender to fit our scale and discussed with @jbartosik)

lallydd commented 1 year ago

@voelzmo The external metric sources only provide real-time data. The one we intend to use for our use case will need some work to handle high query rates without overwhelming any upstream services. Another team member here is working on a different recommender that can take percentiles directly from an external data source.

The local-testing configuration I've put together for #5153 can let us run the benchmark to exhaust the recommender safely.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Shubham82 commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Shubham82 commented 6 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 month ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/autoscaler/issues/5493#issuecomment-2182120339): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.