kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

User "system:serviceaccount:kube-system:vpa-recommender" cannot patch resource "verticalpodautoscalers" in API group "autoscaling.k8s.io" #5982

Closed pmona closed 1 year ago

pmona commented 1 year ago

VPA version 0.14 Kubernetes 1.27.2

I followed the deployment method found in the README.md Metrics server was already running on my cluster.

[user1@lab-app1 vertical-pod-autoscaler]# kubectl get apiservice | grep -i metrics
v1beta1.metrics.k8s.io                 k8s-mgmt/prometheus-adapter     True                      152d

[user1@lab-app1 vertical-pod-autoscaler]# kubectl top pods -n kube-system
NAME                                                  CPU(cores)   MEMORY(bytes)   
calico-kube-controllers-64458677cc-rvgjp              3m           63Mi            
calico-node-2mqg5                                     42m          157Mi           
calico-node-72r4k                                     36m          160Mi           
calico-node-857ft                                     29m          143Mi           
calico-node-9jzz6                                     42m          144Mi           
calico-node-b4vnd                                     27m          145Mi           
calico-node-pkh4w                                     30m          81Mi            
calico-node-t7chm                                     19m          145Mi           
coredns-7b6dc7894d-ntfw7                              2m           28Mi            
coredns-7b6dc7894d-tlh2c                              1m           25Mi            
etcd-dok8scontroller1prdsnv69lab                      70m          131Mi           
etcd-dok8scontroller2prdsnv69lab                      87m          145Mi           
etcd-dok8scontroller3prdsnv69lab                      95m          125Mi           
kube-apiserver-dok8scontroller1prdsnv69lab            99m          860Mi           
kube-apiserver-dok8scontroller2prdsnv69lab            238m         1192Mi          
kube-apiserver-dok8scontroller3prdsnv69lab            98m          1024Mi          
kube-controller-manager-dok8scontroller1prdsnv69lab   1m           19Mi            
kube-controller-manager-dok8scontroller2prdsnv69lab   24m          105Mi           
kube-controller-manager-dok8scontroller3prdsnv69lab   1m           19Mi            
kube-proxy-976w4                                      0m           22Mi            
kube-proxy-mdbqp                                      0m           18Mi            
kube-proxy-mgv7k                                      0m           27Mi            
kube-proxy-mphq2                                      0m           22Mi            
kube-proxy-nlh5g                                      0m           24Mi            
kube-proxy-nwdgf                                      0m           22Mi            
kube-proxy-rxggw                                      0m           16Mi            
kube-scheduler-dok8scontroller1prdsnv69lab            2m           21Mi            
kube-scheduler-dok8scontroller2prdsnv69lab            1m           26Mi            
kube-scheduler-dok8scontroller3prdsnv69lab            3m           22Mi            
metrics-server-5879964b97-bsvn2                       3m           22Mi            
vpa-admission-controller-9b8db6df-xhdjd               0m           13Mi            
vpa-recommender-6ff566f774-hgg8h                      0m           16Mi            
vpa-updater-599cfb6c8f-lx9rm                          0m           15Mi

checking for errors in the pods, I find this in the vpa-recommender pod:

E0725 20:43:20.874185       1 recommender.go:128] Cannot update VPA hamster-vpa object. Reason: verticalpodautoscalers.autoscaling.k8s.io "hamster-vpa" is forbidden: User "system:serviceaccount:kube-system:vpa-recommender" cannot patch resource "verticalpodautoscalers" in API group "autoscaling.k8s.io" in the namespace "default"

I have not been able to find any posts with this issue. Can anyone tell me what needs to be done to resolve this so VPA can provide recommendations?

[user1@lab-app1 examples]# k get vpa
NAME          MODE   CPU   MEM   PROVIDED   AGE
hamster-vpa   Auto                          21s

[user1@lab-app1 examples]# k describe vpa hamster-vpa
Name:         hamster-vpa
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2023-07-25T20:39:11Z
  Generation:          1
  Resource Version:    95483585
  UID:                 7da83c61-8e7b-49f8-ae71-d49bffa24763
Spec:
  Resource Policy:
    Container Policies:
      Container Name:  *
      Controlled Resources:
        cpu
        memory
      Max Allowed:
        Cpu:     1
        Memory:  500Mi
      Min Allowed:
        Cpu:     100m
        Memory:  50Mi
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         hamster
  Update Policy:
    Update Mode:  Auto
Events:           <none>
iNoahNothing commented 1 year ago

Caused by: https://github.com/kubernetes/autoscaler/pull/5911

I ran into the same issue and had to patch the vpa-status-actor ClusterRole to include verticalpodautoscalers in the resources allowed to be patched.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-status-actor
rules:
  - apiGroups:
      - autoscaling.k8s.io
    resources:
      - verticalpodautoscalers
      - verticalpodautoscalers/status
    verbs:
      - get
      - patch
Shubham82 commented 1 year ago

I think it is a bug /kind bug

Shubham82 commented 1 year ago

cc @jbartosik

voelzmo commented 1 year ago

@pmona @iNoahNothing That's interesting! VPA release v0.14.0 shouldn't contain the change adding the /status subresource. There was some back-and-forth before cutting the v0.14.0 release (see the list of PRs and the corresponding "revert" PRs) and the final commit adding it only made it into master after v0.14.0 had been released.

Could you help me understand how you're installing VPA (helm chart, custom yaml, yaml file from the VPA repo, i.e. deployment/recommender-deployment.yaml)? Are you building the images yourselves from the repository? What is the value for the image property in your vpa-recommender deployment?

This will become relevant for the v0.15.0 release and is supposed to be covered in the release notes with https://github.com/kubernetes/autoscaler/issues/5921

EDIT: I could reproduce this with the exact steps you mentioned above:

This uses the v0.14.0 images, which don't contain the code for the /status subresource yet (see above). BUT: it also uses the RBAC and CRD definitions from master (./deploy/vpa-rbac.yaml), which have been changed in the /status subresource PR and the permissions were switched from allowing PATCH on the entire verticalpodautoscalers resource to only allowing PATCH for the /status subresource.

TL;DR: In order to install a certain VPA version from the repo, you have to check out the corresponding tag first (git checkout vertical-pod-autoscaler-0.14.0). The repo contains files which are subject to change and we cannot guarantee that the current state on master works well with the images from a different tag.

pmona commented 1 year ago

To install I did the following: git clone https://github.com/kubernetes/autoscaler.git cd autoscaler/vertical-pod-autoscaler ./hack/vpa-up.sh

I did not checkout any specific branch.

user1@dok8scontroller1:/usr/local/src/autoscaler/vertical-pod-autoscaler/deploy# grep image: recommender-deployment*.yaml 
recommender-deployment-high.yaml:        image: registry.k8s.io/autoscaling/vpa-recommender:0.14.0
recommender-deployment-low.yaml:        image: registry.k8s.io/autoscaling/vpa-recommender:0.14.0
recommender-deployment.yaml:        image: registry.k8s.io/autoscaling/vpa-recommender:0.14.0

user1@dok8scontroller1:/usr/local/src/autoscaler/vertical-pod-autoscaler# cat ./deploy/vpa-rbac.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-reader
rules:
  - apiGroups:
      - "metrics.k8s.io"
    resources:
      - pods
    verbs:
      - get
      - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-actor
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - nodes
      - limitranges
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - get
      - list
      - watch
      - create
  - apiGroups:
      - "poc.autoscaling.k8s.io"
    resources:
      - verticalpodautoscalers
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "autoscaling.k8s.io"
    resources:
      - verticalpodautoscalers
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-status-actor
rules:
  - apiGroups:
      - "autoscaling.k8s.io"
    resources:
      - verticalpodautoscalers/status
    verbs:
      - get
      - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-checkpoint-actor
rules:
  - apiGroups:
      - "poc.autoscaling.k8s.io"
    resources:
      - verticalpodautoscalercheckpoints
    verbs:
      - get
      - list
      - watch
      - create
      - patch
      - delete
  - apiGroups:
      - "autoscaling.k8s.io"
    resources:
      - verticalpodautoscalercheckpoints
    verbs:
      - get
      - list
      - watch
      - create
      - patch
      - delete
  - apiGroups:
      - ""
    resources:
      - namespaces
    verbs:
      - get
      - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:evictioner
rules:
  - apiGroups:
      - "apps"
      - "extensions"
    resources:
      - replicasets
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - pods/eviction
    verbs:
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:metrics-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-reader
subjects:
  - kind: ServiceAccount
    name: vpa-recommender
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-actor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-actor
subjects:
  - kind: ServiceAccount
    name: vpa-recommender
    namespace: kube-system
  - kind: ServiceAccount
    name: vpa-updater
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-status-actor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-status-actor
subjects:
  - kind: ServiceAccount
    name: vpa-recommender
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-checkpoint-actor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-checkpoint-actor
subjects:
  - kind: ServiceAccount
    name: vpa-recommender
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-target-reader
rules:
  - apiGroups:
    - '*'
    resources:
    - '*/scale'
    verbs:
    - get
    - watch
  - apiGroups:
      - ""
    resources:
      - replicationcontrollers
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - batch
    resources:
      - jobs
      - cronjobs
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-target-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-target-reader
subjects:
  - kind: ServiceAccount
    name: vpa-recommender
    namespace: kube-system
  - kind: ServiceAccount
    name: vpa-admission-controller
    namespace: kube-system
  - kind: ServiceAccount
    name: vpa-updater
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-evictioner-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:evictioner
subjects:
  - kind: ServiceAccount
    name: vpa-updater
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpa-admission-controller
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpa-recommender
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpa-updater
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-admission-controller
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - configmaps
      - nodes
      - limitranges
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "admissionregistration.k8s.io"
    resources:
      - mutatingwebhookconfigurations
    verbs:
      - create
      - delete
      - get
      - list
  - apiGroups:
      - "poc.autoscaling.k8s.io"
    resources:
      - verticalpodautoscalers
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "autoscaling.k8s.io"
    resources:
      - verticalpodautoscalers
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "coordination.k8s.io"
    resources:
      - leases
    verbs:
      - create
      - update
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-admission-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-admission-controller
subjects:
  - kind: ServiceAccount
    name: vpa-admission-controller
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:vpa-status-reader
rules:
  - apiGroups:
      - "coordination.k8s.io"
    resources:
      - leases
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:vpa-status-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpa-status-reader
subjects:
  - kind: ServiceAccount
    name: vpa-updater
    namespace: kube-system
voelzmo commented 1 year ago

Yeah, thanks for the details! I think I figured this out, see my EDIT in the comment above: You may be using master state and this only works when checking out the corresponding tag (vertical-pod-autoscaler-0.14.0) first.

BUT: We saw earlier that a mistake happened when tagging vpa 0.14.0: the deployment/recommender-deployment.yaml file still points to the 0.13.0 image 🙈 So would have to bump this version manually – this shouldn't be the case for upcoming releases.

pmona commented 1 year ago

Ran: ./hack/vpa-down.sh git checkout vertical-pod-autoscaler-0.14.0 ./hack/vpa-up.sh

All is working as expected now.
Thank you for the help

voelzmo commented 1 year ago

Great to hear things are running well now for you! To re-iterate: you are currently running image version 0.13.0 after the above steps. You'll have to manually patch deployment/recommender-deployment.yaml to use 0.14.0 because of the above mentioned issue.

iNoahNothing commented 1 year ago

@voelzmo Yep! I came to that realization last night. Curious that building VPA from master is still failing with the same error. Am I doing something wrong with the build process

for component in admission-controller recommender updater; do REGISTRY=nkrause ALL_ARCHITECTURES=amd64 make docker-build --directory=pkg/${component}; done
voelzmo commented 1 year ago

@iNoahNothing In theory, what you're doing looks great! However, VPA doesn't use multi-stage builds in its Dockerfiles yet, so with that command you'll just re-package the existing vpa-recommender binary on a new docker image. Instead of make docker-build, you need to call make release, which also builds the binary from the currently checked out sources.

jbartosik commented 1 year ago

Looks like this is resolved, thanks @voelzmo for handling this.

Please reopen if more help is needed here.