kubernetes-sigs / hierarchical-namespaces

Home of the Hierarchical Namespace Controller (HNC). Adds hierarchical policies and delegated creation to Kubernetes namespaces for improved in-cluster multitenancy.
Apache License 2.0
607 stars 105 forks source link

Allow RoleBindings to support AllowPropagate mode #341

Closed gbmeuk closed 4 months ago

gbmeuk commented 10 months ago

Our use-case for this is where other controllers (Config Sync's namespace reconciler RepoSync) are creating role bindings in a namespace, which has no uniqueness in its name (name: configsync.gke.io:ns-reconciler). This causes the binding to be inherited to children and then no RepoSyncs can be created in the descendants. The current approach of adding a propagate.hnc.x-k8s.io/none: "true" annotation also is challenging as we're not adding the role.

Trying to set the AllowPropagate like:

spec:
  resources:
    - resource: rolebindings
      group: rbac.authorization.k8s.io
      mode: AllowPropagate

Returns:

failed to apply HNCConfiguration.hnc.x-k8s.io, /config: admission webhook "hncconfigurations.hnc.x-k8s.io" denied the request: HNCConfiguration.hnc.x-k8s.io "config" is invalid: spec.resources[1]: Invalid value: rolebindings.rbac.authorization.k8s.io: always uses the 'Propagate' mode and cannot be configured

It feels that now that AllowPropagate feature is available that this can be relaxed?

NB. We have actually managed to work around the issue with RepoSync by pre-defining the role binding in a Helm chart in exactly the same way as Config Sync does.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    propagate.hnc.x-k8s.io/none: "true"
  name: configsync.gke.io:ns-reconciler
  namespace: {{ .Release.namespace" }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: configsync.gke.io:ns-reconciler
subjects:
- kind: ServiceAccount
  name:  ns-reconciler-{{ .Release.namespace" }}-root-sync-9
  namespace: 

However, it's a fragile solution because if something changes with Config Sync's approach, it will fail again. Also, Config Sync doesn't seem to care that it already exists but this may not work for other controllers/scenarios.

On a side note, I can partially work around the constraint by dropping group from the resourced object:

spec:
  resources:
    - resource: rolebindings
      mode: AllowPropagate

So I suspect there may be a bit of a bug in the matching algorithm. It only partially works though and causes some other component of the system to revert the change frequently.

{"level":"info","ts":1698313059.863884,"logger":"hncconfig.reconcile","msg":"Changing sync mode of the object reconciler","gvk":"rbac.authorization.k8s.io/v1, Kind=RoleBinding","oldMode":"AllowPropagate","newMode":"Propagate"}
{"level":"info","ts":1698313059.8659315,"logger":"hncconfig.reconcile","msg":"Changing sync mode of the object reconciler","gvk":"rbac.authorization.k8s.io/v1, Kind=RoleBinding","oldMode":"Propagate","newMode":"AllowPropagate"}

There is also this. Albeit, not exactly a problem but does suggest that the matching needs a review. When I set role bindings to Propagate as the value is needs to be, there is still an admission webhook error:

spec:
  resources:
    - resource: rolebindings
      mode: Propagate
admission webhook "hncconfigurations.hnc.x-k8s.io" denied the request: HNCConfiguration.hnc.x-k8s.io "config" is invalid: spec.resources[1]: Invalid value: rolebindings.rbac.authorization.k8s.io: always uses the 'Propagate'
adrianludwin commented 10 months ago

Yes, that "matching" thing is weird - can you please file a separate bug for that? Thanks!

The reason we've hardcoded Roles and RoleBinding as always being propagated is because the admission controllers assume that if you have an RB in a parent namespace, you have it in the descendants as well. Switching to AllowPropagate would force someone to remember to add the annotation to every RB except the ConfigSync one, which doesn't seem ideal.

Do the CS RBs have any kind of unique label on them? E.g. Rancher adds cattle.io/creator=norman to all its objects. If CS adds some kind of annotation, you can add --nopropagation-label as described here (it's at the end of the list), which is designed exactly for this purpose.

gbmeuk commented 10 months ago

Thanks for the response. I've raised https://github.com/kubernetes-sigs/hierarchical-namespaces/issues/343

There are no labels or annotations added by config sync. I'll raise a case to them separately. Thanks.

I'm not so sure the use-case isn't valid still. The default would still be Propagate and having the option of setting it to AllowPropagate would simply give the decision to the admins of the parent namespaces. Assuming that the mode would work the same as for other resources and most would be using some configuration tool for it all (we're using a Helm chart), it gives more control back to the admins. Documentation would be "the default is Propagate but check your local system by querying the HNCConfiguration" - which people should be doing for other resources anyway.

adrianludwin commented 10 months ago

If we do this, I'd at least like an analysis of how the admission webhooks work with it (see https://github.com/kubernetes-sigs/hierarchical-namespaces/blob/master/internal/hierarchyconfig/validator.go#L308). I'd also make this off by default - e.g., default behaviour is that even admins can't disable this without a command-line flags - to emphasize that this is not a well-supported flow. That should be pretty easy to implement though. wdyt?

On Mon, Oct 30, 2023 at 5:47 AM Gareth Brown @.***> wrote:

Thanks for the response. I've raised #343 https://github.com/kubernetes-sigs/hierarchical-namespaces/issues/343

There are no labels or annotations added by config sync. I'll raise a case to them separately. Thanks.

I'm not so sure the use-case isn't valid still. The default would still be Propagate and having the option of setting it to AllowPropagate would simply give the decision to the admins of the parent namespaces. Assuming that the mode would work the same as for other resources and most would be using some configuration tool for it all (we're using a Helm chart), it gives more control back to the admins. Documentation would be "the default is Propagate but check your local system by querying the HNCConfiguration"

  • which people should be doing for other resources anyway.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/hierarchical-namespaces/issues/341#issuecomment-1784830076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZH55F7FHNNPZEOEW43YB5ZT5AVCNFSM6AAAAAA6ST6E32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUHAZTAMBXGY . You are receiving this because you commented.Message ID: @.***>

gbmeuk commented 10 months ago

If we do this, I'd at least like an analysis of how the admission webhooks

work with it (see

https://github.com/kubernetes-sigs/hierarchical-namespaces/blob/master/internal/hierarchyconfig/validator.go#L308).

What exactly are you thinking here? From reading the notes there, I'm assuming you are referring to how access may be lost or gained inadvertently in a child namespace when it's moved, per se?

I'd also make this off by default - e.g., default behaviour is that even

admins can't disable this without a command-line flags - to emphasize that

this is not a well-supported flow. That should be pretty easy to implement

though. wdyt?

Yes, makes sense.

adrianludwin commented 10 months ago

Yeah, the logic in those admission webhooks assume that "access to parent" => "access to descendants." What will break once that's no longer true? Will anyone be able to do anything they previously couldn't (unlikely, but it would be good to think it through)? Will anyone lose the ability to restructure namespaces that they would have had (almost certain) and how do they fix that if it happens? All this will need to go in the user docs.

On Wed, Nov 1, 2023 at 7:25 AM Gareth Brown @.***> wrote:

If we do this, I'd at least like an analysis of how the admission webhooks

work with it (see

https://github.com/kubernetes-sigs/hierarchical-namespaces/blob/master/internal/hierarchyconfig/validator.go#L308 ).

What exactly are you thinking here? From reading the notes there, I'm assuming you are referring to how access may be lost or gained inadvertently in a child namespace when it's moved, per se?

I'd also make this off by default - e.g., default behaviour is that even

admins can't disable this without a command-line flags - to emphasize that

this is not a well-supported flow. That should be pretty easy to implement

though. wdyt?

Yes, makes sense.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/hierarchical-namespaces/issues/341#issuecomment-1788797340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZHYAGLESMY2T7JWXOLYCIWQXAVCNFSM6AAAAAA6ST6E32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYG44TOMZUGA . You are receiving this because you commented.Message ID: @.***>

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

gbmeuk commented 5 months ago

From our perspective, Config Sync team added labels so we can use the no-propagate argument and resolve this issue that way. Not sure if there needs to be any further conversation about the feature.

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 4 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/hierarchical-namespaces/issues/341#issuecomment-2067742553): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.