rancher / webhook

Rancher webhook for Kubernetes
Apache License 2.0
22 stars 62 forks source link

automatic update to 0.3.4 broke namespace creation #245

Closed belfhi closed 1 year ago

belfhi commented 1 year ago

The upstream Rancher Cluster updated the downstream rancher-webhook chart to version rancher-webhook-2.0.4+up0.3.4 which broke the creation of namespaces for non-admin users. The error message was the following:

admission webhook "rancher.cattle.io.namespaces.create-non-kubesystem" denied the request: Unauthorized

rancher version: 2.7.3 kubernetes version: 1.24.13

The workaround was to manually set the image tag back to 0.3.3

horihel commented 1 year ago

Thank you @belfhi - this took me a full day to find.

after turning on debug logging (which is undocumented, but thankfully the code is readable) it looks like every erroneous rejection is handled twice

time="2023-05-31T13:03:09Z" level=debug msg="admit result: CREATE /v1, Kind=Namespace test/test user=u-8hjfn allowed=true err=<nil>"
time="2023-05-31T13:03:09Z" level=debug msg="admit result: CREATE /v1, Kind=Namespace test/test user=u-8hjfn allowed=false err=<nil>"
wirwolf commented 1 year ago

I have the same problem on multiple clusters. Why rancher-webhook install the latest version automatically? This update damage work in my company! I run a command in pipeline to add project into new namespace

kubectl annotate namespace ${KUBE_NAMESPACE} field.cattle.io/projectId="${KUBE_RANCHER_CLUSTER_ID}:${KUBE_RANCHER_PROJECT_ID}" --overwrite=true
frankbou commented 1 year ago

We are impacted too by this issue on multiple clusters. Why rancher-webhook installs the latest version automatically?

We need to keep control on these updates.

rancher version: 2.7.3 kubernetes version: 1.24.10 (v1.24.10-rancher4-1) downstream cluster : rke1

frankbou commented 1 year ago

@wirwolf : FYI, I opened a separated issue regarding the automatic update. https://github.com/rancher/webhook/issues/246

horihel commented 1 year ago

This update might have to do with a recent security problem and the 2.7.4 release of rancher yesterday. Is anyone able to check if the same bug exists on rancher 2.7.4 with rancher-webhooks 0.3.4?

belfhi commented 1 year ago

Hi, yes, I upgraded my Rancher deployment to 2.7.4 this morning and can confirm that it works again! The question still remains why the helm chart was updated automatically in the first place to a disfunctional version?

MbolotSuse commented 1 year ago

I'm sorry that you ran into this.

This appears to be caused by how our agent image is built - unlike the core rancher image, it does not have the webhook version set, so it will pull in/deploy the latest version that it sees available in our charts repo. This coincided with our recent release of v2.7.4 to cause an upgrade for users on newer versions of Rancher.

As far as workarounds go, you can choose to manually downgrade the webhook version if you wish. If you do this, I would recommend updating the cattle-cluster-agent deployment and adding the CATTLE_RANCHER_WEBHOOK_MIN_VERSION env var set to the appropriate version (for 2.7.3 it would be 2.0.3+up0.3.3, for 2.7.2 it would be 2.0.2+up0.3.2). This will prevent the webhook from being upgraded to the 2.7.4 version (or re-upgraded if you previously downgraded this). However, this will not force a current webhook to be downgraded - you'll need to do that manually through helm or the rancher ui.

Keep in mind that if you apply this workaround you may need to remove this value after an upgrade to a future version containing a fix. If that turns out to be necessary, we will leave an update noting how to "reverse" the effects of this workaround.

niusmallnan commented 1 year ago

I experienced rancher-webhook being upgraded to 0.3.5 in RMS 2.7.4 and this is my workaround.

# pin rancher-webhook to 2.0.4+up0.3.4
kubectl patch -n cattle-system deployment cattle-cluster-agent -p '{"spec":{"template":{"spec":{"containers":[{"name":"cluster-register","env":[{"name":"CATTLE_RANCHER_WEBHOOK_MIN_VERSION", "value":"2.0.4+up0.3.4"}]}]}}}}'

# get the revision of rancher-webhook-2.0.4+up0.3.4
helm history rancher-webhook -n cattle-system
# rollback to 0.3.4
helm rollback rancher-webhook <REVISION-NUMBER> -n cattle-system
lindhe commented 1 year ago

The workaround didn't work for me. ☚ī¸ I'm having the issue where the NetApp Trident operator fails with status message Failed to install Trident; err: failed to patch Trident installation namespace trident; admission webhook "rancher.cattle.io.namespaces" denied the request: Unauthorized. I was hoping this workaround would work in that case too, it seemed so similar...

I just want to check that it's not me missing something. Why would the CATTLE_RANCHER_WEBHOOK_MIN_VERSION variable prevent the webhook from upgrading to a newer version? There's clearly something I'm not getting here.

EDIT 1: I upgraded to Rancher v2.7.5 instead and that resolved the issue! :)

EDIT 2: No, upgrading to v2.7.5 did not fix it. 😞 I looked at the wrong cluster. Damn it.

KevinJoiner commented 1 year ago

There are two problems here.

  1. Downstream clusters would get automatically upgraded with newer versions for the webhook that did not match the version of cattle-agent. This should be fixed v2.7.5 https://github.com/rancher/rancher/issues/41730

  2. Users are unable to create or update on some namespaces. This could be intentional based on the desired action performed on the namespace. We are working on adding more docs to this repo that will give more transparency to what we are actually validating which can be tracked here https://github.com/rancher/rancher/issues/41493 Below is the validation that Webhook currently performs on namespaces.

Validation Checks

Note: The kube-system namespace, unlike other namespaces, has a failPolicy of ignore on update calls.

Project annotation

Verifies that the annotation field.cattle.io/projectId value can only be updated by users with the manage-namespaces verb on the project specified in the annotation.

PSA Label Validation

Validates that users who create or edit a PSA enforcement label on a namespace have the updatepsa verb on projects in management.cattle.io/v3. See the upstream docs for more information on the effect of these labels.

The following labels are considered relevant for PSA enforcement:

lindhe commented 1 year ago

@KevinJoiner I believe so too. đŸ¤Ŗ

KevinJoiner commented 1 year ago

@lindhe 😆 Didn't even notice

ekristen commented 11 months ago

~@KevinJoiner I hit this problem, flux cannot update a namespace that it previously cloud. flux has cluster-admin but is still being denied. I can't add a cluster role to a project, so what's the mechanism to allow roles the ability to set project annotation on a namespace?~

Of course after looking at this for a while, found the problem about 10 seconds after I submitted this comment. Apologies. It was a misconfigured projectId that took a while to find that made it seem like a permissions issue.

MbolotSuse commented 1 week ago

As an update, a script to update this (and remove the update later on) can be found here: https://github.com/rancherlabs/support-tools/tree/master/adjust-downstream-webhook. Apologies for the late reply.