rancher / system-upgrade-controller

In your Kubernetes, upgrading your nodes
Apache License 2.0
707 stars 86 forks source link

Error during upgrade: User "system:serviceaccount:system-upgrade:system-upgrade" cannot delete resource "pods" #319

Closed damdo closed 2 weeks ago

damdo commented 1 month ago

Version

v0.13.4

Platform/Architecture

linux-amd64

Describe the bug

During an upgrade from k3s v1.29.3 to v1.30.2 with system-upgrade-controller v0.13.4 the Pods triggered by the upgrade apply-os-upgrade job fail early with Init:Error.

To Reproduce

Upgrade from k3s v1.29.3 to v1.30.2 with system-upgrade-controller v0.13.4

Expected behavior

Successful upgrade

Actual behavior

Early failure during upgrade

Additional context

Inspecting the logs of the drain InitContainer, the failure is caused by the following error:

User "system:serviceaccount:system-upgrade:system-upgrade" cannot delete resource "pods" in API group "" in the namespace "cert-manager"

This suggests a missing pods delete permission in the system-upgrade-controller-drainer ClusterRole.

lexfrei commented 1 month ago

I have the same issue, but I can't start the pod. Fresh installed k3s v1.30.3

Pod logs:

W0806 22:45:48.476270       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-08-06T22:45:48Z" level=fatal msg="Error starting: namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:system-upgrade:system-upgrade\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\""
damdo commented 1 month ago

We are on the same boat then @lexfrei

cc. @brandond could you please take a look at the linked PR thanks!

lexfrei commented 1 month ago

IDK why, but it just self healed on the night. Just found it's alive now. My installation: https://github.com/lexfrei/k8s/blob/3e92bfca26ab1e3be39fd5ed17b3ab1a349f84f7/argocd/infra/system-upgrade-controller.yaml