kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8k stars 3.94k forks source link

VPA Not honoring maxAllowed Memory Limit #6996

Open kmsarabu opened 3 months ago

kmsarabu commented 3 months ago

I am encountering an issue with the Vertical Pod Autoscaler (VPA) where it does not honor the maxAllowed resource limits for memory. Below is the VPA definition I am using:

apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: test-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        controlledResources: ["cpu", "memory"]
        minAllowed:
          cpu: 500m
          memory: 4Gi
        maxAllowed:
          cpu: 4
          memory: 16Gi

After running a CPU stress test, the resulting resource limits observed on the pods are:

Limits:
  cpu:     4
  memory:  32Gi   <- this is more than VPA Object's MaxAllowed->memory?
Requests:
  cpu:      500m
  memory:   4Gi

Despite setting the maxAllowed memory limit to 16Gi, the VPA scaled the memory up to 32Gi.

Steps to Reproduce:

  1. Deploy a VPA with the provided configuration.
  2. Apply a CPU stress test to the target deployment.
  3. Observe the memory and CPU limits on the autoscaled pods.

Expected Behavior: The memory limit should not exceed the maxAllowed value of 16Gi. Actual Behavior: The memory limit scales up to 32Gi, exceeding the maxAllowed value.

Could there be any known issues or configurations that might lead to this behavior? Thank you in advance for your help!

adrianmoisey commented 3 months ago

/area vertical-pod-autoscaler

Adarsh-verma-14 commented 2 months ago

Hi @kmsarabu , you need to define containerPolicies(minAllowed and maxAllowed) in the CRD file (vpa-v1-crd.yaml) instead of VPA definition file because it's provides clear guidelines for how VerticalPodAutoscalers should manage resource scaling.

Adarsh-verma-14 commented 2 months ago

and also could you share the logs of VPA definition(pod) ?

Adarsh-verma-14 commented 2 months ago

also you can take refrence for CustomResourceDefinition (CRD) file from this page:https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/deploy/vpa-v1-crd.yaml

voelzmo commented 2 months ago

/assign @raywainman

sarg3nt commented 3 weeks ago

@raywainman @voelzmo I'm running into the same thing but with CPU. We are trying to use VPA to help tune apps on our cluster, especially Open Telemetry, i.e. Prometheus, Grarfana, Tempo, etc. Let's use the Tempo Ingester as an example: I'm setting requests and limits on the Tempo Ingester StatefulSet and creating a VPA as follows.

The StatefulSet resources request and limits

resources:       
  limits:        
    cpu: "2"     
    memory: 8Gi  
  requests:      
    cpu: 500m    
    memory: 3Gi  

The VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    meta.helm.sh/release-name: sel-telemetry
    meta.helm.sh/release-namespace: sel-telemetry
  creationTimestamp: "2024-09-12T17:45:53Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: sel-telemetry-tempo-ingester-vpa
  namespace: sel-telemetry
  resourceVersion: "693765"
  uid: ead8e8de-edd6-46e9-9aff-477640b450b2
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      controlledResources:
      - cpu
      - memory
      controlledValues: RequestsAndLimits
      maxAllowed:
        cpu: 2000m
        memory: 6Gi
      minAllowed:
        cpu: 1000m
        memory: 4Gi
  targetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: sel-telemetry-tempo-ingester
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2024-09-12T17:46:53Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: ingester
      lowerBound:
        cpu: "1"
        memory: 4Gi
      target:
        cpu: "1"
        memory: 4Gi
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: "1"
        memory: 4Gi

VPA is straight up ignoring the maxAllowed CPU of 2 and setting the pod to 4 CPU's

Resultant Pod YAML

resources:                  
  limits:                   
    cpu: "4"                
    memory: "11453246122"   
  requests:                 
    cpu: "1"                
    memory: 4Gi             

The problem can also be seen in Grafana, notice the 4 CPU cores image

**Admission Controller Logs Containing "ingester"***

I0912 18:10:56.111158       1 matcher.go:73] Let's choose from 25 configs for pod sel-telemetry/sel-telemetry-tempo-ingester-1 
I0912 18:10:56.111178       1 recommendation_provider.go:91] updating requirements for pod sel-telemetry-tempo-ingester-1.     
I0912 18:12:00.372082       1 matcher.go:73] Let's choose from 25 configs for pod sel-telemetry/sel-telemetry-tempo-ingester-0 
I0912 18:12:00.372103       1 recommendation_provider.go:91] updating requirements for pod sel-telemetry-tempo-ingester-0.     

There were no logs in the updater containing "ingester"

**Updater Logs Containing "ingester"***

I0912 18:11:58.094754       1 update_priority_calculator.go:146] pod accepted for update sel-telemetry/sel-telemetry-tempo-ingester-0 with priority 1.3333333333333333 -
 processed recommendations:                                                                                                                                             
ingester: target: 4294968k 1000m; uncappedTarget: 262144k 25m;                                                                                                          
I0912 18:11:58.094796       1 updater.go:228] evicting pod sel-telemetry/sel-telemetry-tempo-ingester-0                                                                 

Deleting the VPA for this resource and restarting the StatefulSet results in the correct configuration as set by the StatfuSet's min and max resources.

This makes using VPA untennable for us as it's doing the opposite of what we want it to. It's reserving all the resources of our cluster so new apps can't be deployed due to lack of available CPU when in reality a tiny bit of CPU is actually being used on the cluster.

Is there a fix or do we need to abandon using VPA?

adrianmoisey commented 2 weeks ago

My understanding is that VPA focuses on requests only. Limits are set as a ratio between the requests/limits before the Pod is processed by the VPA. See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#limits-control

There are methods to keep the limit lower, but they aren't in the VPA object itself.

sarg3nt commented 2 weeks ago

My understanding is that VPA focuses on requests only. Limits are set as a ratio between the requests/limits before the Pod is processed by the VPA. See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#limits-control

There are methods to keep the limit lower, but they aren't in the VPA object itself.

I do not believe this is correct. The docs you linked to ay

When setting limits VPA will conform to resource policies.

We are setting a resource policy, it says it will conform to its limits, it is not. Its output even says what it will set it to, but then doesn't do that and sets it to something much higher.

Same with the comments in the code:

        // Controls how the autoscaler computes recommended resources.
    // The resource policy may be used to set constraints on the recommendations
    // for individual containers.
    // If any individual containers need to be excluded from getting the VPA recommendations, then
    // it must be disabled explicitly by setting mode to "Off" under containerPolicies.
    // If not specified, the autoscaler computes recommended resources for all containers in the pod,
    // without additional constraints.
    // +optional

Focus on the `If not specified, the autoscaler computes recommended resources for all containers in the pod, without additional constraints." Which is saying if you don't specify a resourced policy then it will compute it, but if you do then it will use those constraints.

Lastly, if it's not going to obey its own resource constraints for CPU and Memory upper bounds . . .. then why are they a thing? Why can I set them if it is, as you say, designed to ignore them?

raywainman commented 2 weeks ago

The logic that does the capping is here:

https://github.com/kubernetes/autoscaler/blob/a2b793d530022f7ffb96d26a47995a08d5ae343e/vertical-pod-autoscaler/pkg/utils/vpa/capping.go#L157

The way I'm (quickly) reading the code here - it should actually be capping the actual requests and limits. Is there possibly a bug here?

(I'll try and spend a bit more time looking at this)

adrianmoisey commented 2 weeks ago

Take a look at https://github.com/kubernetes/autoscaler/blob/a2b793d530022f7ffb96d26a47995a08d5ae343e/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/recommendation/recommendation_provider.go#L49-L103

That seems to be where the VPA grabs the requests and limits recommendations. The limits come from vpa_api_util.GetProportionalLimit(). If you follow that code path, it ends up as a calculation based on the request/limit ratio of the incoming Pod.

Another reference is the GKE docs (which aren't the same as this project, but close enough as a useful datapoint): https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler#containerresourcepolicy_v1_autoscalingk8sio

maxAllowed | ResourceList Specifies the maximum CPU request and memory request allowed for the container. By default, there is no maximum applied.

raywainman commented 2 weeks ago

Thanks for digging that up Adrian, that makes sense.

I found this old issue talking about this: https://github.com/kubernetes/autoscaler/issues/2359#issuecomment-533532055

The recommendation there is to actually remove the CPU limit altogether and let the pod burst as needed, VPA will adjust CPU request from there (up to the cap).

For memory, it recommends keeping the limit and then letting the OOMs trigger scale up by VPA.

And that ultimately spun out https://github.com/kubernetes/autoscaler/issues/2387 asking for a way to give users more control over limit scaling which led to https://github.com/kubernetes/autoscaler/pull/3028.

This ultimately introduced the ability to disable limit scaling altogether by changing RequestsAndLimits to RequestsOnly - does that help in this case? Then you could set your pod limits to the max allowed values and let VPA handle the requests?

adrianmoisey commented 2 weeks ago

Do we need to take action here? I assume clarifying the documentation may make sense? Specifically https://github.com/kubernetes/autoscaler/tree/a2b793d530022f7ffb96d26a47995a08d5ae343e/vertical-pod-autoscaler#limits-control

sarg3nt commented 2 weeks ago

@adrianmoisey I would argue there are bugs here. The VPA should not be setting CPU limits higher than the resource policy stimulated and that it says it is going to. You can say it's a documentation issue if you want, but I really don't think it is.
If the developers do not believe that VPA should be setting limits, and doing it correctly, then why have it? Rip it out and make it clear it is for requests only. Having it half way working and IMHO actively harming the cluster is MUCH worse than not having it at all.
Please correct me if I'm missing something obvious here.

In any case, the product just doesn't feel ready to me. It's doing unexpected things which are not what I want a system that is actively changing deployments and restarting them doing in my k8s clusters. The purpose of a tool like this is to decrease the work we admins have to do to keep workloads healthy, this is feeling like it's doing the opposite to me.

I'd love to know what the team is planning. From what I've read in other tasks, seems like there is very little bandwidth to improve VPA and it's kind of stagnating. Knowing what the future looks like would help us decide if we put it on hold or rip it out of our automation completely (right now we have it off)

voelzmo commented 4 days ago

If the developers do not believe that VPA should be setting limits, and doing it correctly, then why have it?

Limits are not a VPA feature, they are a kubernetes feature. The general consensus seems to be that most workload owners shouldn't be setting CPU limits. There are valid use-cases for CPU limits, but if you don't have this very particular use-case: don't use them. When using CPU limits, you're basically sacrificing performance for predictability, so this isn't what most people want. For memory limits, the consensus seems to be different: If you know your workload well enough, you can use them to detect memory leaks and other unwanted behavior.

Having it half way working and IMHO actively harming the cluster is MUCH worse than not having it at all.

Can you help me understand the harm that the current VPA behavior is causing? A CPU limit which is ridiculously high shouldn't cause any harm and be very comparable to not having a CPU limit at all (if not present, the limit is equal to the Node's capacity). I agree that it isn't adding any value above not specifying the CPU limit – but this is a configuration issue on the user's end then.

Depending on your use-case, this thread already contains the possible options to get the desired behavior:

So maybe the way to start here is: what's your motivation to set limits for CPU and memory? And: Are the limits for your workload statically or should they be changed depending on the load on your workload? This should lead you to the correct way to configure your workload and its VPA settings.

Regarding this specific issue: This works as designed. I don't think there's anything to do here from the VPA side. There is no simple solution here, as kubernetes and also VPA are quite complex, because there are so many different use-cases and scenarios out there. All these settings exist for a reason.

sarg3nt commented 4 days ago

@voelzmo Thank you for getting back to me. I just want to clarify the following two items.

  1. Setting a VPA resource policy max allowed value is designed to NOT set the maximum allowed values for VPA to set that resource too. That this is by design and not a bug.
  2. The VPA output indicating it will set it to 2 CPU cores and then sets it to 4 is also by design and not a bug.

Is the above correct? These are not bugs? Thank you.

adrianmoisey commented 4 days ago

Setting a VPA resource policy max allowed value is designed to NOT set the maximum allowed values for VPA to set that resource too. That this is by design and not a bug.

If the VPA sets the requests to the same value as the max, it is not a bug. If the VPA sets the limits to a value higher than the max, it it also not a bug.

The VPA output indicating it will set it to 2 CPU cores and then sets it to 4 is also by design and not a bug.

If the VPA sets the requests of the Pod to the same value as the recommendation, it is not a bug. If the VPA sets the limits of the Pod to a higher value than the recommendation and RequestsAndLimits is set (which is default), it it also not a bug.

The primary use of the VPA is for it to set requests on Pods.

sarg3nt commented 4 days ago

@adrianmoisey Then I would recommend removing the ability to set the max values in the policy as they are VERY confusing to an end user and from what I'm gathering you are saying, don't seem to do anything.

I would also recommend updating the docs to inform users, clearly and succinctly what VPA is for as you see it because that is not what the docs currently indicate and we've wasted a LOT of time on this for a product that does not do what the docs led us to believe. Part of that was probably me going in with a preconceived notion of what something called "Vertical pod autoscaler" would do and maybe that's my bad.

So from what you are saying, VPA is designed for setting requests and therefor to help the scheduler know where to schedule workloads.

To answer one of your previous questions (which I didn't do in the last response because I didn't want to pollute the point of my question.). Yes, limits are very necessary in most workloads due to the noisy neighbor issue. We have build agents for Jenkins that will take every single CPU core you throw at them if you don't set a limit. If a workload has issues and gets into a loop it can do the same thing. It's the same principle as setting max RAM, which should be a requirement for any sane k8s admin IMHO. We have a kyverno policy for that.

I'll take this back to the team to figure out what we want to do. I'm not sure we will see the point in using it as our goal was to help change the request and limits values as app requirements grow without us having to baby sit workloads.