Open chrpinedo opened 1 month ago
We use MS Azure AD and I can confirm that after the v0.4.6 update (which seems to happen automatically during restart of Rancher) we can not modify project permissions:
Internal error occurred: failed calling webhook "rancher.cattle.io.projectroletemplatebindings.management.cattle.io": failed to call webhook: an error on the server ("{\"kind\":\"AdmissionReview\",\"apiVersion\":\"admission.k8s.io/v1\",....,\"kind\":\"ProjectRoleTemplateBinding\",\"metadata\":{\"annotations\":{\"field.catt") has prevented the request from succeeding
Rolling back the webhook deployment (NS cattle-system) to v0.4.5 in the "local" cluster seems to resolve the problem. Is there anything we need to tweak for the v0.4.6? In the Rancher logs I can see the following error:
"status\":\"Failure\",\"message\":\"Internal error occurred: failed to get rules from referenced roleTemplate 'project-owner': failed to check externalRules feature flag: features.management.cattle.io \\\"external-rules\\\" not found\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"failed to get rules from referenced roleTemplate 'project-owner': failed to check externalRules feature flag: features.management.cattle.io \\\"external-rules\\\" not found\"}]},\"code\":500}}}") has prevented the request from succeeding, requeuing
And also this error:
2024/06/13 09:26:15 [ERROR] error syncing 'p-qdpq2/creator-project-owner': handler mgmt-auth-prtb-controller: failed to remove finalizer on controller.cattle.io/mgmt-auth-prtb-controller, handler cluster-prtb-sync: failed to remove finalizer on clusterscoped.controller.cattle.io/cluster-prtb-sync_c-nxkcl, requeuing
@pmatseykanets I saw that last commits are merged by you. Can you maybe help?
This issue also seems to affect removal of clusters in Rancher. As mentioned before, a downgrade to v0.4.5 in the "local" cluster resolves the problem and clusters get removed as expected when external user/groups are configured in the cluster.
@phoenix-bjoern @chrpinedo we are actively working on the fix for this - hoping to resolve it before EOD
A new chart version for rancher-webhook with the fix is being released now.
release complete
Apologies for the issue. This was a bug in v0.4.6 of the webhook. As @nicholasSUSE has noted, we have recently released a new version of the webhook (v0.4.7) which does not have this bug. The upgrade of the webhook (for affected versions that don't have a pinned webhook version) should be automatic. However, users can attempt to manually initiate the upgrade of the webhook by refreshing the rancher-charts
repo from the UI. This can be done by the following steps: go to the local cluster -> Apps -> Repositories -> click the 3 buttons to the right of the rancher-charts
repo, and click refresh. You may need to restart rancher (through kubectl rollout restart deploy/rancher -n cattle-system
) to get the changes to take immediate effect. You can also manually upgrade the webhook by selecting the application in the UI (local cluster -> Apps -> Installed Apps -> change the UI filter to "All Namespaces" -> rancher-webhook).
For users who can't upgrade to v0.4.7, you can work around the issue by creating this resource in the local cluster using kubectl:
apiVersion: management.cattle.io/v3
kind: Feature
metadata:
name: external-rules
spec:
value: false
Thanks @samjustus @nicholasSUSE and @MbolotSuse for the super fast response and resolution of the issue. Will check the new version tomorrow.
One additional question: Why does this (important) component in Rancher do unattended auto-updates? I didn't expect that any component would get updated unless we initiate a Rancher version update (and perform backups to roll back on such issues). Doesn't this behavior violate the principal of immutable software releases?
@phoenix-bjoern We agree that this specific component should not do unattended auto-updates. We consider this a defect/bug, and have filed an issue here: rancher/rancher#45418 to track the resolution.
@MbolotSuse I can confirm the functionality is working again after the v0.4.7 update (which again happened unattended). IMHO this issue can be closed.
rancher-webhook chart was automatically updated in my deployment to version 0.4.6 the 12th of June and since then I am unable to modify/add/remove role of users of one project. It seems to be a problem with the integration/validation of LDAP groups.
There are also errors in the rancher-webhook pod:
I solved this issue by rolling back to the previous version 0.4.5
However, a bit later Rancher forces again the upgrade of the webhook. But I confirm that rolling back to 0.4.5 solves this issue.