kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.11k stars 3.98k forks source link

VPA : "vpa-recommender-performance" not working as expected #7228

Closed Rajpratik71 closed 2 months ago

Rajpratik71 commented 2 months ago

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.2.1

What k8s version are you using (kubectl version)?:

kubectl version Output
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.13+048520e
WARNING: version difference between client (1.30) and server (1.27) exceeds the supported minor version skew of +/-1

What environment is this in?:

Openshift on Baremetal

What did you expect to happen?:

In "vpa-recommender-performance" of deployment, "VPA" Objects should get "CPU", "MEM", "PROVIDED" after reconciliation.

NAMESPACE   NAME                                        MODE   CPU    MEM         PROVIDED   AGE
staging     vpa-common-web-ui                           Auto   25m    262144k     True       3m28s

What happened instead?:

"VPA" object doesn't got "CPU", "MEM", "PROVIDED" after reconciliation.

NAMESPACE   NAME                                           MODE   CPU   MEM   PROVIDED   AGE
staging     vpa-common-web-ui                              Auto                          5m5s

How to reproduce it (as minimally and precisely as possible):

Updated recommender-deployment to recommender-deployment-performance and executed ./hack/vpa-process-yamls.sh create

Install goes fine, and all pods were running fine.

kube-system                                        vpa-admission-controller-5f9b8c8db4-2tpk8                         1/1     Running             0                28s
kube-system                                        vpa-recommender-performance-9d6884d48-tgzq8                       1/1     Running             0                31s
kube-system                                        vpa-updater-76c7c55f56-vlwwq                                      1/1     Running             0                33s

Anything else we need to know?:

Getting below in "vpa-recommender-performance" pod log when deployed in "performance" mode

pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc -n kube-system logs -f --tail=10 deploy/vpa-recommender-performance
W0901 17:50:46.390967       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-2} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-2} POD}
W0901 17:50:46.390971       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-2} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-2} POD}
W0901 17:50:46.390984       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-3} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-3} POD}
W0901 17:50:46.390988       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-3} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-3} POD}
W0901 17:50:46.390998       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-4} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-4} POD}
W0901 17:50:46.391002       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-4} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-4} POD}
W0901 17:50:46.391011       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-5} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-5} POD}
W0901 17:50:46.391015       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-5} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-5} POD}
W0901 17:50:46.391025       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-6} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-6} POD}
W0901 17:50:46.391029       1 cluster_feeder.go:443] Error adding metric sample for container {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-6} POD}: KeyError: {{staging stagingmd5dahkquxgk8876phdxdn7-universal-connector-6} POD}
adrianmoisey commented 2 months ago

/area vertical-pod-autoscaler

adrianmoisey commented 2 months ago

As discussed on the sig-autoscaling call, can you provide the spec of the vpa-common-web-ui VPA?

Rajpratik71 commented 2 months ago

As discussed on the sig-autoscaling call, can you provide the spec of the vpa-common-web-ui VPA?

Describe of VPA Object :

pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc describe vpa -n gi-perf     vpa-common-web-ui
Name:         vpa-common-web-ui
Namespace:    gi-perf
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2024-09-02T14:48:01Z
  Generation:          1
  Resource Version:    45668866
  UID:                 1b1e8184-df12-445b-8fe8-255717e7b724
Spec:
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         common-web-ui
  Update Policy:
    Update Mode:  Auto
Events:           <none>

VPA Object :

pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc get vpa -n gi-perf     vpa-common-web-ui -o yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"vpa-common-web-ui","namespace":"gi-perf"},"spec":{"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"common-web-ui"},"updatePolicy":{"updateMode":"Auto"}}}
  creationTimestamp: "2024-09-02T14:48:01Z"
  generation: 1
  name: vpa-common-web-ui
  namespace: gi-perf
  resourceVersion: "45668866"
  uid: 1b1e8184-df12-445b-8fe8-255717e7b724
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: common-web-ui
  updatePolicy:
    updateMode: Auto
adrianmoisey commented 2 months ago

Can you try setting spec.recommenders to the following:

spec:
  recommenders:
    - name: performance

The performance VPA has a name set, so it isn't the default: https://github.com/kubernetes/autoscaler/blob/d3cbc10c6b021afcefee36b1a98a75a913852065/vertical-pod-autoscaler/deploy/recommender-deployment-high.yaml#L32

Rajpratik71 commented 2 months ago

Can you try setting spec.recommenders to the following:

spec:
  recommenders:
    - name: performance

The performance VPA has a name set, so it isn't the default:

https://github.com/kubernetes/autoscaler/blob/d3cbc10c6b021afcefee36b1a98a75a913852065/vertical-pod-autoscaler/deploy/recommender-deployment-high.yaml#L32

Ok got it. Looks like after adding the spec, it is working and recommendation is being provided.

pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc get vpa -n gi-perf     vpa-common-web-ui -o yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"vpa-common-web-ui","namespace":"gi-perf"},"spec":{"recommenders":[{"name":"performance"}],"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"common-web-ui"},"updatePolicy":{"updateMode":"Auto"}}}
  creationTimestamp: "2024-09-02T14:48:01Z"
  generation: 2
  name: vpa-common-web-ui
  namespace: gi-perf
  resourceVersion: "45784458"
  uid: 1b1e8184-df12-445b-8fe8-255717e7b724
spec:
  recommenders:
  - name: performance
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: common-web-ui
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2024-09-02T15:16:29Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: common-web-ui
      lowerBound:
        cpu: 25m
        memory: "280439064"
      target:
        cpu: 25m
        memory: "297164212"
      uncappedTarget:
        cpu: 25m
        memory: "297164212"
      upperBound:
        cpu: 334m
        memory: "9030153299"
pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc get vpa -n gi-perf     vpa-common-web-ui           
NAME                MODE   CPU   MEM         PROVIDED   AGE
vpa-common-web-ui   Auto   25m   297164212   True       45m

But one other issue i see that , even after recommendation is provided, original object spec is not being updated and also it is not matching or near the original resource usages.

pratikraj@Pratiks-MacBook-Pro vertical-pod-autoscaler % oc adm top po -A | grep common-web-ui 
gi-perf                                            common-web-ui-d494596c5-5jsj2                                     0m           139Mi     

Resource Spec from Deployment

  Containers:
   common-web-ui:
    Image:     xxxxxxxxxxxxxxxxxxxx
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     1
      memory:  440Mi
    Requests:
      cpu:                130m
      ephemeral-storage:  256Mi
      memory:             256Mi

Also, below log is observed in "vpa-updater" pod log, which is complaining about "Global Replica"

I0902 14:49:40.106221       1 pods_eviction_restriction.go:226] too few replicas for ReplicaSet gi-perf/common-web-ui-d494596c5. Found 1 live pods, needs 2 (global 2)

where we do define this and any flag to ignore this ?

As, i thinking that this can be the reason why "Resource Spec" is not being updated.

adrianmoisey commented 2 months ago

where we do define this and any flag to ignore this ?

Yup, it's possible that you're hitting this: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#i-get-recommendations-for-my-single-pod-replicaset-but-they-are-not-applied

Do note that by setting this to 1, it's possible that when the recommendation is applied, you may not have any pods serving this workload.

Rajpratik71 commented 2 months ago

Thanks for reference and support @adrianmoisey . suggested config works.