Open sozercan opened 3 years ago
+1 for deprecation, however since deployment spec.selector
is immutable after creation, we need to consider documentation for the upgrade process.
What is the normal upgrade path for something like this? Deleting the deployment and recreating?
Could someone clarify why the deprecation of the control-plane
label is necessary?
It seems to me that the control-plane
label is an established (although not fully "standardized") way of marking system namespace, for which admission controllers should be excluded. At least AKS advocates that approach.
We are about to contribute a PR to kubespray to add said label to all clusters created with kubespray.
Would it be possible to keep the control-plane
label for webhook exclusion in Gatekeeper?
TBH I have no strong feelings about the existence (or lack thereof) of a control-plane
label on Gatekeeper resources.
I do think that the label being under-defined is a problem. What should the significance of control-plane
be? What is an appropriate value for that label to have? I think that, without an authoritative answer to this question, the notion of using this label to make meaningful decisions is potentially brittle as different projects may use it in different ways.
From a security standpoint, I'm not in love with the idea of using labels to control which resources are subject to a webhook and which are not. This makes the ability to set labels equivalent to the ability to bypass policy checks. It can make sense, but needs to be done carefully to avoid unintended consequences.
Historically, Gatekeeper had that label because it was auto-generated by Kubebuilder v1. I think it was then applied to the gatekeeper-system
namespace and used as a way to bypass the G8r webhook to avoid self-management until it was replaced by the (more locked-down and explicit) admission.gatekeeper.sh/ignore
label. Now it exists as a kind of vestigial appendage (or so I thought).
As I said, I am kind of "meh" about the label with a lean toward removing it just to avoid bikeshedding over label values, etc. If there is an emerging standard that should be something to consider.
Curious to hear others' opinions.
What is the normal upgrade path for something like this? Deleting the deployment and recreating? Yes, I believe that is the only way. That implies a temporary lapse in enforcement as that environment rolls over to the new deployment.
Would it be possible to keep the control-plane label for webhook exclusion in Gatekeeper?
I think we're talking about several different changes all in one.
control-plane
labels from the Deployment's pod selector. This is internal cleanup and reduces the chances of different kubebuilder-generated deployments in the same namespace from conflicting with each other (trying to steal each other's pods). Even if they were left in place, I think the generic value control-plane: controller-manager
on the GK webhook deployment is risky.control-plane
labels from the gatekeeper-system namespace - again this is auto-generated and its significance is not defined anywhere. Gatekeeper already defines a new label with well-defined semantics for avoiding self-managing and having both set on the namespace does not provide any value I can see.control-plane
as a default exclusion in the GK reference ValidatingWebhookConfig - if we do this, we should call it out in our release notes and give users clear guidance on a safe transition and how to keep the old behavior if they want.The control-plane
label does not follow best-practices as it is not prefixed in any way. As a reference see some well-known labels documented by Kubernetes here and here. My opinion is that it is better to phase it out than to perpetuate it into other projects like kubespray.
My 2c.
+1 on phased approach to deprecate this.
We can also introduce a new field in the helm chart to allow users to provide this label via helm values.yaml. It can initially default to control-plane
then after few releases with warnings in our release notes, we can remove it from the default values.yaml but provide steps to help guide users to set it in their own values.yaml.
I see your point. The control-plane
label is too generic and needs to be replaced with something more intuitive. I retracted the kubespray PR and we'll look for a better alternative for Compliant Kubernetes.
- Removal of the
control-plane
as a default exclusion in the GK reference ValidatingWebhookConfig - if we do this, we should call it out in our release notes and give users clear guidance on a safe transition and how to keep the old behavior if they want.
This has been removed starting with v3.1.0: https://github.com/open-policy-agent/gatekeeper/pull/758
Closed via #758
@ritazh we still have control-plane
references in our manifests. #758 only removed from vwh.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Hi, is it currently safe as an end user to remove control-plane
from the gatekeeper-system
namespace? I believe that label is contributing to https://github.com/open-policy-agent/gatekeeper/issues/2142 as per my comment on that issue. Please see comment for more detail
Edit: I believe this may also be causing https://issuetracker.google.com/issues/204835306?pli=1
Is it safe to remove the control-plane: controller-manager label from the gatekeeper-system namespace currently, if we have already applied the admission.gatekeeper.sh/ignore: no-self-managing label?
If you already have the admission.gatekeeper.sh/ignore: no-self-managing
label, then yes it's safe to remove the control-plane
label.
I think we have waited long enough to remove the control-plane
label everywhere. WDYT? @sozercan @maxsmythe I can open a PR to remove it.
I am having admission.gatekeeper.sh/ignore: no-self-managing and control-plane=controller-manager label in gatekeeper-system namespace. For testing, to fix the TLS handshake error, I tied to remove the control-plane label from gatekeeper-system namespace and its deployments. After the rolling restart of the gatekeeper deployments the control-plane labels got added automatically back to gatekeeper namespace and deployments. Do we need to update some where to make it permanent
gatekeeper version - v3.8.1
Hi @sri53 how do you deploy Gatekeeper? Is it by Helm, or by K8S manifests, or by some other method? Also what version of Gatekeeper and Kubernetes do you deploy to?
Hi @tspearconquest how do you deploy Gatekeeper? Deployed through Azure Policy Add-on for AKS what version of Gatekeeper - 3.8.1 Kubernetes version - AKS 1.23.12
I believe Azure Policy Add-on enforces the label in your case, so if that's correct then removing it for your gatekeeper instance won't be possible until it has been removed in upstream and Azure Policy upgrades to a version which has it removed.
Hi @ritazh - It seems my suspicion was not correct, and removing the control-plane
label did not help.
It's really interesting that this is only affecting Gatekeeper, as we do have other tools with MWH and VWH which do not see this problem, and the traffic causing the errors is 100% coming from the konnectivity-agent
pods in kube-system
I also took a look in konnectivity configmap and deployment manifest in one of our clusters to see if I could find a log format option, but I'm afraid I couldn't find any. My main concern is that these are not coming in json format, so it causes a lot of spam for our fluentd instance to try to parse non-json log outputs as json.
Thanks for the update @tspearconquest! I can dig into this a bit later and report back.
Describe the solution you'd like [A clear and concise description of what you want to happen.]
control-plane
labels are leftovers from kubebuilder 1.0. They can be deprecated but we should give users time to move off using these labels.Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):