rancher / system-upgrade-controller

In your Kubernetes, upgrading your nodes
Apache License 2.0
676 stars 83 forks source link

no matches for kind "Plan" in version "upgrade.cattle.io/v1" #298

Open StefanSa opened 4 months ago

StefanSa commented 4 months ago

Version v0.13.4

Platform/Architecture openSUSE MicroOS 20240221

Describe the bug When i create a new plan, i get this error message:

resource mapping not found for name: "k3s-server" namespace: "system-upgrade" from "k3s-upgrade.yaml": no matches for kind "Plan" in version "upgrade.cattle.io/v1"
ensure CRDs are installed first
resource mapping not found for name: "k3s-agent" namespace: "system-upgrade" from "k3s-upgrade.yaml": no matches for kind "Plan" in version "upgrade.cattle.io/v1"
ensure CRDs are installed first

To Reproduce

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/v0.13.4/manifests/system-upgrade-controller.yaml

kubectl label node master-01 master-02 worker-01 worker-02 worker-03 k3s-upgrade=true kubectl apply -f k3s-upgrade.yaml


apiVersion: upgrade.cattle.io/v1
kind: Plan
  name: k3s-server
  namespace: system-upgrade
    k3s-upgrade: server
  concurrency: 1
  version: v1.29.2+k3s1
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
    image: rancher/k3s-upgrade
apiVersion: upgrade.cattle.io/v1
kind: Plan
  name: k3s-agent
  namespace: system-upgrade
    k3s-upgrade: agent
  concurrency: 2 # in general, this should be the number of workers - 1
  version: v1.29.2+k3s1
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: NotIn, values: ["true"]}
  serviceAccountName: system-upgrade
    # Since v0.5.0-m1 SUC will use the resolved version of the plan for the tag on the prepare container.
    # image: rancher/k3s-upgrade:v1.17.4-k3s1
    args: ["prepare", "k3s-server"]
    force: true
    skipWaitForDeleteTimeout: 60 # set this to prevent upgrades from hanging on small clusters since k8s v1.18

Expected behavior Upgrade plan without error message.

Actual behavior Error message: no matches for kind "Plan" in version "upgrade.cattle.io/v1"

Additional context log in pod:

E0307 13:10:47.489933       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:11:29.290633       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:11:34.471197       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0307 13:12:03.206769       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:12:10.178176       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0307 13:12:55.404339       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:12:59.802270       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
brandond commented 4 months ago

You should also apply the CRD manifest: https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml

StefanSa commented 4 months ago

@brandond Hi Brad, thanks for the hint, that at least fixed this error. But now i don't get any active jobs displayed and there are still these error messages in the pod.

kubectl get jobs -n system-upgrade
No resources found in system-upgrade namespace.
E0308 08:11:43.584498       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
2024-03-08T08:12:36.630633052Z E0308 08:12:36.630500       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
2024-03-08T08:13:27.715830430Z E0308 08:13:27.715712       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:14:25.235647       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:15:21.877204       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:16:20.665483       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:16:57.158737       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:17:27.865155       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:18:03.792967       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:18:41.345443       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:19:23.898774       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:20:14.628402       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
SISheogorath commented 4 months ago

Do you see the system-upgrade-controller role in the namespace?

StefanSa commented 4 months ago

@SISheogorath There are two cluster roles here. One is system-upgrade-controller and the other is system-upgrade-controller-drainer.

StefanSa commented 4 months ago

@SISheogorath @brandond Error found. The role were incomplete. The authorization to read secrets and all rights for the jobs were missing. With this role it works without a problem:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
    kubectl.kubernetes.io/last-applied-configuration: >
    objectset.rio.cattle.io/applied: >-
    objectset.rio.cattle.io/id: eaba6099-a546-453f-b734-edf421bdb5c5
  creationTimestamp: '2024-03-07T10:42:28Z'
    - wrangler.cattle.io/auth-prov-v2-crole
    objectset.rio.cattle.io/hash: 6766606835033e32b12089b633df4cc4319dd6fa
    - apiVersion: rbac.authorization.k8s.io/v1
      fieldsType: FieldsV1
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2024-03-07T10:42:28Z'
    - apiVersion: rbac.authorization.k8s.io/v1
      fieldsType: FieldsV1
            f:objectset.rio.cattle.io/applied: {}
            f:objectset.rio.cattle.io/id: {}
            .: {}
            v:"wrangler.cattle.io/auth-prov-v2-crole": {}
            .: {}
            f:objectset.rio.cattle.io/hash: {}
        f:rules: {}
      manager: rancher
      operation: Update
      time: '2024-03-08T11:50:34Z'
  name: system-upgrade-controller
  resourceVersion: '23869128'
  uid: d3802361-ab53-4cbe-92a9-aaffcd61f325
  - apiGroups:
      - batch
      - jobs
      - get
      - list
      - watch
      - create
      - delete
      - patch
      - update
  - apiGroups:
      - ''
      - namespaces
      - nodes
      - get
      - list
      - watch
  - apiGroups:
      - ''
      - nodes
      - update
  - apiGroups:
      - upgrade.cattle.io
      - plans
      - plans/status
      - get
      - list
      - watch
      - create
      - patch
      - update
      - delete
  - apiGroups:
      - ''
      - secrets
      - list
SISheogorath commented 4 months ago

There should be two clusterroles and one role. When I adjusted the roles for the controller, I decided to limit secrets and job creation to the namespace of the controller.


Maybe this was too restrictive. I just double checked my setup, the controller is functional here with these roles.

The reason I asked for its existence is that it might be related to object ordering: https://github.com/rancher/system-upgrade-controller/pull/296

StefanSa commented 4 months ago

Here there are only clusterroles and no role

StefanSa commented 4 months ago

watch missing also on secret

Failed to watch *v1.Secret: unknown (get secrets.meta.k8s.io)
SISheogorath commented 4 months ago

If you apply the release manifest a second time (now that the namespace exists), does it fix the issue?

StefanSa commented 4 months ago

all objects unchanged.

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/v0.13.4/manifests/system-upgrade-controller.yaml
namespace/system-upgrade unchanged
serviceaccount/system-upgrade unchanged
configmap/default-controller-env unchanged
deployment.apps/system-upgrade-controller unchanged
SISheogorath commented 4 months ago

If you use the manifests directory from the tag in the git repository, you have to apply all manifests.

The release manifest I referred to is attached to the Release on GitHub: https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml

StefanSa commented 4 months ago

That looks good.

kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml
clusterrole.rbac.authorization.k8s.io/system-upgrade-controller configured
role.rbac.authorization.k8s.io/system-upgrade-controller created
clusterrole.rbac.authorization.k8s.io/system-upgrade-controller-drainer unchanged
clusterrolebinding.rbac.authorization.k8s.io/system-upgrade-drainer unchanged
clusterrolebinding.rbac.authorization.k8s.io/system-upgrade unchanged
rolebinding.rbac.authorization.k8s.io/system-upgrade created
namespace/system-upgrade unchanged
serviceaccount/system-upgrade unchanged
configmap/default-controller-env unchanged
deployment.apps/system-upgrade-controller configured

And no more error messages, a role has also been created.

1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-03-08T13:00:42Z" level=info msg="No access to list CRDs, assuming CRDs are pre-created."
2024-03-08T13:00:43.076970550Z time="2024-03-08T13:00:43Z" level=info msg="Starting /v1, Kind=Node controller"
2024-03-08T13:00:43.077002976Z time="2024-03-08T13:00:43Z" level=info msg="Starting /v1, Kind=Secret controller"
2024-03-08T13:00:43.099580197Z time="2024-03-08T13:00:43Z" level=info msg="Starting batch/v1, Kind=Job controller"
2024-03-08T13:00:43.146037841Z time="2024-03-08T13:00:43Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"
SISheogorath commented 4 months ago

Well, now we know, that #296 actually fixed a problem 🙌🏻

gravufo commented 2 months ago

The permissions are not correct, the upgrade pod spews errors that it is not allowed to delete pods. I tried completely removing all SUC resources (including CRDs) and reinstalling from scratch and the issue persists. Going back to version 0.13.2 makes it work. There is definitely something wrong in the roles or cluster roles. Didn't dig deeper yet.