rancher / webhook

Rancher webhook for Kubernetes
Apache License 2.0
23 stars 59 forks source link

unable to modify project's member or roles after upgrade to 0.4.6 #397

Open chrpinedo opened 1 month ago

chrpinedo commented 1 month ago

rancher-webhook chart was automatically updated in my deployment to version 0.4.6 the 12th of June and since then I am unable to modify/add/remove role of users of one project. It seems to be a problem with the integration/validation of LDAP groups.

Captura de pantalla_20240613_090201

There are also errors in the rancher-webhook pod:

time="2024-06-13T07:03:08Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"
time="2024-06-13T07:03:09Z" level=error msg="failed to get rules from referenced roleTemplate 'project-member': failed to check externalRules feature flag: features.management.cattle.io \"external-rules\" not found"

I solved this issue by rolling back to the previous version 0.4.5

helm -n cattle-system history rancher-webhook
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION     
3               Thu Apr 11 10:19:21 2024        superseded      rancher-webhook-103.0.2+up0.4.3 0.4.3           Upgrade complete
4               Thu May  9 23:55:01 2024        superseded      rancher-webhook-103.0.4+up0.4.5 0.4.5           Upgrade complete
5               Wed Jun 12 17:54:18 2024        superseded      rancher-webhook-103.0.5+up0.4.6 0.4.6           Upgrade complete
6               Thu Jun 13 09:05:04 2024        superseded      rancher-webhook-103.0.4+up0.4.5 0.4.5           Rollback to 4   
7               Thu Jun 13 07:06:52 2024        deployed        rancher-webhook-103.0.5+up0.4.6 0.4.6           Upgrade complete

However, a bit later Rancher forces again the upgrade of the webhook. But I confirm that rolling back to 0.4.5 solves this issue.

phoenix-bjoern commented 1 month ago

We use MS Azure AD and I can confirm that after the v0.4.6 update (which seems to happen automatically during restart of Rancher) we can not modify project permissions:

Internal error occurred: failed calling webhook "rancher.cattle.io.projectroletemplatebindings.management.cattle.io": failed to call webhook: an error on the server ("{\"kind\":\"AdmissionReview\",\"apiVersion\":\"admission.k8s.io/v1\",....,\"kind\":\"ProjectRoleTemplateBinding\",\"metadata\":{\"annotations\":{\"field.catt") has prevented the request from succeeding

Rolling back the webhook deployment (NS cattle-system) to v0.4.5 in the "local" cluster seems to resolve the problem. Is there anything we need to tweak for the v0.4.6? In the Rancher logs I can see the following error:

"status\":\"Failure\",\"message\":\"Internal error occurred: failed to get rules from referenced roleTemplate 'project-owner': failed to check externalRules feature flag: features.management.cattle.io \\\"external-rules\\\" not found\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"failed to get rules from referenced roleTemplate 'project-owner': failed to check externalRules feature flag: features.management.cattle.io \\\"external-rules\\\" not found\"}]},\"code\":500}}}") has prevented the request from succeeding, requeuing

And also this error:

2024/06/13 09:26:15 [ERROR] error syncing 'p-qdpq2/creator-project-owner': handler mgmt-auth-prtb-controller: failed to remove finalizer on controller.cattle.io/mgmt-auth-prtb-controller, handler cluster-prtb-sync: failed to remove finalizer on clusterscoped.controller.cattle.io/cluster-prtb-sync_c-nxkcl, requeuing
phoenix-bjoern commented 1 month ago

@pmatseykanets I saw that last commits are merged by you. Can you maybe help?

phoenix-bjoern commented 1 month ago

This issue also seems to affect removal of clusters in Rancher. As mentioned before, a downgrade to v0.4.5 in the "local" cluster resolves the problem and clusters get removed as expected when external user/groups are configured in the cluster.

samjustus commented 1 month ago

@phoenix-bjoern @chrpinedo we are actively working on the fix for this - hoping to resolve it before EOD

nicholasSUSE commented 1 month ago

A new chart version for rancher-webhook with the fix is being released now.

samjustus commented 1 month ago

release complete

MbolotSuse commented 1 month ago

Apologies for the issue. This was a bug in v0.4.6 of the webhook. As @nicholasSUSE has noted, we have recently released a new version of the webhook (v0.4.7) which does not have this bug. The upgrade of the webhook (for affected versions that don't have a pinned webhook version) should be automatic. However, users can attempt to manually initiate the upgrade of the webhook by refreshing the rancher-charts repo from the UI. This can be done by the following steps: go to the local cluster -> Apps -> Repositories -> click the 3 buttons to the right of the rancher-charts repo, and click refresh. You may need to restart rancher (through kubectl rollout restart deploy/rancher -n cattle-system) to get the changes to take immediate effect. You can also manually upgrade the webhook by selecting the application in the UI (local cluster -> Apps -> Installed Apps -> change the UI filter to "All Namespaces" -> rancher-webhook).

For users who can't upgrade to v0.4.7, you can work around the issue by creating this resource in the local cluster using kubectl:

apiVersion: management.cattle.io/v3
kind: Feature
metadata:
  name: external-rules
spec:
  value: false
phoenix-bjoern commented 1 month ago

Thanks @samjustus @nicholasSUSE and @MbolotSuse for the super fast response and resolution of the issue. Will check the new version tomorrow.

One additional question: Why does this (important) component in Rancher do unattended auto-updates? I didn't expect that any component would get updated unless we initiate a Rancher version update (and perform backups to roll back on such issues). Doesn't this behavior violate the principal of immutable software releases?

MbolotSuse commented 1 month ago

@phoenix-bjoern We agree that this specific component should not do unattended auto-updates. We consider this a defect/bug, and have filed an issue here: rancher/rancher#45418 to track the resolution.

phoenix-bjoern commented 1 month ago

@MbolotSuse I can confirm the functionality is working again after the v0.4.7 update (which again happened unattended). IMHO this issue can be closed.