rancher / system-upgrade-controller

In your Kubernetes, upgrading your nodes
Apache License 2.0
676 stars 83 forks source link

no matches for kind "Plan" in version "upgrade.cattle.io/v1" #298

Open StefanSa opened 4 months ago

StefanSa commented 4 months ago

Version v0.13.4

Platform/Architecture openSUSE MicroOS 20240221

Describe the bug When i create a new plan, i get this error message:

resource mapping not found for name: "k3s-server" namespace: "system-upgrade" from "k3s-upgrade.yaml": no matches for kind "Plan" in version "upgrade.cattle.io/v1"
ensure CRDs are installed first
resource mapping not found for name: "k3s-agent" namespace: "system-upgrade" from "k3s-upgrade.yaml": no matches for kind "Plan" in version "upgrade.cattle.io/v1"
ensure CRDs are installed first

To Reproduce

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/v0.13.4/manifests/system-upgrade-controller.yaml

kubectl label node master-01 master-02 worker-01 worker-02 worker-03 k3s-upgrade=true kubectl apply -f k3s-upgrade.yaml

k3s-upgrade.yaml:

---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-server
  namespace: system-upgrade
  labels:
    k3s-upgrade: server
spec:
  concurrency: 1
  version: v1.29.2+k3s1
  nodeSelector:
    matchExpressions:
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/k3s-upgrade
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-agent
  namespace: system-upgrade
  labels:
    k3s-upgrade: agent
spec:
  concurrency: 2 # in general, this should be the number of workers - 1
  version: v1.29.2+k3s1
  nodeSelector:
    matchExpressions:
      - {key: k3s-upgrade, operator: Exists}
      - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
      - {key: k3s.io/hostname, operator: Exists}
      - {key: k3os.io/mode, operator: DoesNotExist}
      - {key: node-role.kubernetes.io/master, operator: NotIn, values: ["true"]}
  serviceAccountName: system-upgrade
  prepare:
  prepare:
    # Since v0.5.0-m1 SUC will use the resolved version of the plan for the tag on the prepare container.
    # image: rancher/k3s-upgrade:v1.17.4-k3s1
    image:rancher/k3s-upgrade
    args: ["prepare", "k3s-server"]
  drain:
    force: true
    skipWaitForDeleteTimeout: 60 # set this to prevent upgrades from hanging on small clusters since k8s v1.18
  upgrade:
image:rancher/k3s-upgrade

Expected behavior Upgrade plan without error message.

Actual behavior Error message: no matches for kind "Plan" in version "upgrade.cattle.io/v1"

Additional context log in pod:

E0307 13:10:47.489933       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:11:29.290633       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:11:34.471197       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0307 13:12:03.206769       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:12:10.178176       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0307 13:12:55.404339       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Plan: failed to list *v1.Plan: the server could not find the requested resource (get plans.meta.k8s.io)
E0307 13:12:59.802270       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
brandond commented 4 months ago

You should also apply the CRD manifest: https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml

StefanSa commented 4 months ago

@brandond Hi Brad, thanks for the hint, that at least fixed this error. But now i don't get any active jobs displayed and there are still these error messages in the pod.

kubectl get jobs -n system-upgrade
No resources found in system-upgrade namespace.
E0308 08:11:43.584498       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
2024-03-08T08:12:36.630633052Z E0308 08:12:36.630500       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
2024-03-08T08:13:27.715830430Z E0308 08:13:27.715712       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:14:25.235647       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:15:21.877204       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:16:20.665483       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:16:57.158737       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:17:27.865155       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:18:03.792967       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:18:41.345443       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:19:23.898774       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
E0308 08:20:14.628402       1 reflector.go:138] k8s.io/client-go@v1.21.14-k3s1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot list resource "secrets" in API group "" in the namespace "system-upgrade"
SISheogorath commented 4 months ago

Do you see the system-upgrade-controller role in the namespace?

StefanSa commented 4 months ago

@SISheogorath There are two cluster roles here. One is system-upgrade-controller and the other is system-upgrade-controller-drainer.

StefanSa commented 4 months ago

@SISheogorath @brandond Error found. The role were incomplete. The authorization to read secrets and all rights for the jobs were missing. With this role it works without a problem:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"system-upgrade-controller"},"rules":[{"apiGroups":["batch"],"resources":["jobs"],"verbs":["get","list","watch"]},{"apiGroups":[""],"resources":["namespaces","nodes"],"verbs":["get","list","watch"]},{"apiGroups":[""],"resources":["nodes"],"verbs":["update"]},{"apiGroups":["upgrade.cattle.io"],"resources":["plans","plans/status"],"verbs":["get","list","watch","create","patch","update","delete"]}]}
    objectset.rio.cattle.io/applied: >-
      H4sIAAAAAAAA/6yQQa/TMBCE/8uenbykTvyaXDlw58AFvcPa3ry4de3IXhdB1f+OHCohUQQXTtaMdj6P5ga4uc+UsosBZkgaTYuF15jcd2QXQ3s+5tbFl2sPAs4uWJjhgy+ZKX2KnkDAhRgtMsJ8Awwh8p7LVUZ9IsOZuE0utgaZPVWYqxRCjaqbpgbHQTXDKJdGv8qhIbsMh15bPZoR7gI8avJ/xa2YV5hBvSqlOnWUYyclyYPuD91x0kpKuwzGDLKfrFULVmjAC8EM+VtmujRle09oqTExcIreU6o3qXjKMH/ZN/qYYtmqAo1sVngTkCjHksx+A6eoczWvlPRuvBODAO9yfb7+zNzFb6wnTO2VN6xKQIiW/gv0iVM2i0x/yD6W+DXuE2zzGGq5/X3JjFz+2VGASVQ/FLA9jEcDAZY87VXe7j8CAAD///Ycri6NAgAA
    objectset.rio.cattle.io/id: eaba6099-a546-453f-b734-edf421bdb5c5
  creationTimestamp: '2024-03-07T10:42:28Z'
  finalizers:
    - wrangler.cattle.io/auth-prov-v2-crole
  labels:
    objectset.rio.cattle.io/hash: 6766606835033e32b12089b633df4cc4319dd6fa
  managedFields:
    - apiVersion: rbac.authorization.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2024-03-07T10:42:28Z'
    - apiVersion: rbac.authorization.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:objectset.rio.cattle.io/applied: {}
            f:objectset.rio.cattle.io/id: {}
          f:finalizers:
            .: {}
            v:"wrangler.cattle.io/auth-prov-v2-crole": {}
          f:labels:
            .: {}
            f:objectset.rio.cattle.io/hash: {}
        f:rules: {}
      manager: rancher
      operation: Update
      time: '2024-03-08T11:50:34Z'
  name: system-upgrade-controller
  resourceVersion: '23869128'
  uid: d3802361-ab53-4cbe-92a9-aaffcd61f325
rules:
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - get
      - list
      - watch
      - create
      - delete
      - patch
      - update
  - apiGroups:
      - ''
    resources:
      - namespaces
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ''
    resources:
      - nodes
    verbs:
      - update
  - apiGroups:
      - upgrade.cattle.io
    resources:
      - plans
      - plans/status
    verbs:
      - get
      - list
      - watch
      - create
      - patch
      - update
      - delete
  - apiGroups:
      - ''
    resources:
      - secrets
    verbs:
      - list
SISheogorath commented 4 months ago

There should be two clusterroles and one role. When I adjusted the roles for the controller, I decided to limit secrets and job creation to the namespace of the controller.

https://github.com/rancher/system-upgrade-controller/blob/4a643535e6ea8f67ac00961ed1d242fce16ce748/manifests/clusterrole.yaml#L53-L79

Maybe this was too restrictive. I just double checked my setup, the controller is functional here with these roles.

The reason I asked for its existence is that it might be related to object ordering: https://github.com/rancher/system-upgrade-controller/pull/296

StefanSa commented 4 months ago

Here there are only clusterroles and no role

StefanSa commented 4 months ago

watch missing also on secret

Failed to watch *v1.Secret: unknown (get secrets.meta.k8s.io)
SISheogorath commented 4 months ago

If you apply the release manifest a second time (now that the namespace exists), does it fix the issue?

StefanSa commented 4 months ago

all objects unchanged.

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/v0.13.4/manifests/system-upgrade-controller.yaml
namespace/system-upgrade unchanged
serviceaccount/system-upgrade unchanged
configmap/default-controller-env unchanged
deployment.apps/system-upgrade-controller unchanged
SISheogorath commented 4 months ago

If you use the manifests directory from the tag in the git repository, you have to apply all manifests.

The release manifest I referred to is attached to the Release on GitHub: https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml

StefanSa commented 4 months ago

That looks good.

kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml
clusterrole.rbac.authorization.k8s.io/system-upgrade-controller configured
role.rbac.authorization.k8s.io/system-upgrade-controller created
clusterrole.rbac.authorization.k8s.io/system-upgrade-controller-drainer unchanged
clusterrolebinding.rbac.authorization.k8s.io/system-upgrade-drainer unchanged
clusterrolebinding.rbac.authorization.k8s.io/system-upgrade unchanged
rolebinding.rbac.authorization.k8s.io/system-upgrade created
namespace/system-upgrade unchanged
serviceaccount/system-upgrade unchanged
configmap/default-controller-env unchanged
deployment.apps/system-upgrade-controller configured

And no more error messages, a role has also been created.

1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-03-08T13:00:42Z" level=info msg="No access to list CRDs, assuming CRDs are pre-created."
2024-03-08T13:00:43.076970550Z time="2024-03-08T13:00:43Z" level=info msg="Starting /v1, Kind=Node controller"
2024-03-08T13:00:43.077002976Z time="2024-03-08T13:00:43Z" level=info msg="Starting /v1, Kind=Secret controller"
2024-03-08T13:00:43.099580197Z time="2024-03-08T13:00:43Z" level=info msg="Starting batch/v1, Kind=Job controller"
2024-03-08T13:00:43.146037841Z time="2024-03-08T13:00:43Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"
SISheogorath commented 4 months ago

Well, now we know, that #296 actually fixed a problem 🙌🏻

gravufo commented 2 months ago

The permissions are not correct, the upgrade pod spews errors that it is not allowed to delete pods. I tried completely removing all SUC resources (including CRDs) and reinstalling from scratch and the issue persists. Going back to version 0.13.2 makes it work. There is definitely something wrong in the roles or cluster roles. Didn't dig deeper yet.