Closed lallydd closed 1 month ago
First thoughts:
VerticalPodAutoscaler
objects a single recommender
can handle in a cluster.
admission-controller
.Sounds good to me.
I think that first thing likely to slow down is processing VPA objects in recommender.
As we get more and more costly to process VPA objects in a cluster (see VPA recommenders RunOnce
):
UpdateVPAs
and MaintainCheckpoints
will take more time,execution_latency_seconds
metric (time spent on checkpoints will start going down) and in age of checkpoints (they will be getting older)minCheckpointsPerRun
(default 10) and time we spend per loop starts increasing (visible in execution_latency_seconds
metric), checkpoint age will keep increasing (I'd guess age slows down growth to linear here).For latency of admission controller I'm less worried but it'd be good to check it too.
There's also recommendation_latency_seconds
which is important to understand how well the recommender
can deal with many objects being created at the same time.
@lallydd is this also about adjusting the history storage of the recommender
to something external? Or do your external metric sources currently only provide real-time data? If we're also taking history storage into account, it would be interesting how long it takes to backfill the historic data, but AFAIK we don't have a metric for this yet.
One thing to keep in mind for these tests is that they rely on a big-enough sizing for the client-side rate limits towards kube-apiserver (see discussions in https://github.com/kubernetes/autoscaler/issues/4498 where I tried to re-configure our recommender to fit our scale and discussed with @jbartosik)
@voelzmo The external metric sources only provide real-time data. The one we intend to use for our use case will need some work to handle high query rates without overwhelming any upstream services. Another team member here is working on a different recommender that can take percentiles directly from an external data source.
The local-testing configuration I've put together for #5153 can let us run the benchmark to exhaust the recommender
safely.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
To understand the capacity of a VPA deployment, in particular the
recommender
, what kind of performance measurements would we like from a benchmark?Which component are you using?: This will likely end up as an end-to-end test running off of kind on a single box.
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: