kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

[VPA] Weird restarts ("multiple configs") #6010

Closed R-Studio closed 8 months ago

R-Studio commented 1 year ago

Which component are you using?: vertical-pod-autoscaler (recommender, updater & admission-controller)

What version of the component are you using?: v0.13 (Image-Tag: 0.13.0, Fairwirds-Helm Chart: v1.7.2)

Component version: What k8s version are you using (kubectl version)?: OpenShift 4.11, Kubernetes v1.24.12+ceaf338

What environment is this in?: OnPrem VMs

What behaviour did you expect to see?: No unreasonable pod evictions/restarts

What happened instead?: Unseasonable pod evictions/restarts image In the following screenshot we can see that VPA set the CPU requests from 0.055 to 0.043 for 2 minutes and then back to 0.055 again: image I have noticed the following log, why 5 configs? (I only have one VerticalPodAutoscaler for this deployment deployed) matcher.go:68] Let's choose from 5 configs for pod NAMESPACE/xxx-85b97d6dfc-%

How to reproduce it (as minimally and precisely as possible): I am not sure if it is reproducible

I use the following arguments: Recommender:

v: "4"
target-cpu-percentile: 0.50
pod-recommendation-min-cpu-millicores: 10
pod-recommendation-min-memory-mb: 10
recommendation-margin-fraction: 0.0
memory-saver: true

Updater:

v: "4"
min-replicas: 1

AdmissionController: no arguments than the defaults

R-Studio commented 1 year ago

I noticed that vpa-updater protects the newly created pod only for 1min, why? image

voelzmo commented 1 year ago

Hey @R-Studio, thanks for providing some insight into your investigations! I'm not able to answer what's going on completely, so here's just a few pointers to clear the fog step-by-step:

In you case, we can probably rule out cases 2 and 3, given that you're only scaling on CPU (you didn't explicitly mention this, but I was assuming it from the way you investigated this) and that the Pod was evicted just minutes before. So most likely the current requests are outside the recommended range for one of the containers in your Pod. To analyze this, it helps to also draw the upper and lower bounds and see when/how a Container's current requests end up being outside the bounds of the new recommendation (which is case 1 above). This would look like this

Screenshot 2023-08-07 at 11 32 32

In this example graph, we see the "current requests" between the "lower bound", and "upper bound", so we're not in case 1. The new recommendation is lower than the current requests, though, and after 12 hours the Pod would be evicted to apply the new recommendation, as there is more than 10% difference.

A graph like this should show you, why your Pod is getting evicted and hopefully explain what happens.

R-Studio commented 1 year ago

@voelzmo thanks for your reply! πŸ‘πŸ½ First you are right we are only scaling on CPU requests (sorry I forgot to mention this).
Here an example of our VPA resources:

---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: vpa-argocd-applicationset-controller
  namespace: argocd
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: argocd-applicationset-controller
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 10m
        maxAllowed:
          cpu: 3
        controlledResources: ["cpu"]
        controlledValues: "RequestsOnly"

Anyway thanks for all your inputs! πŸ‘πŸ½

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

voelzmo commented 8 months ago

/close /kind support

k8s-ci-robot commented 8 months ago

@voelzmo: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/6010#issuecomment-1926483207): >/close >/kind support Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
voelzmo commented 8 months ago

/remove-kind bug