operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.7k stars 542 forks source link

The Correct way to add additional ClusterRoleBindings for Operand Service Accounts #2063

Open phantomjinx opened 3 years ago

phantomjinx commented 3 years ago

Type of question

Are you asking about community best practices, how to implement a specific feature, or about general context and help around the operator-sdk?

Question

What did you do? The syndesis operator is migrating to operator-sdk 1.0.0+ and as such is changing to the correct file layout and bundle generation. The operator binary has up until now generated a CSV manifest via custom golang code which we would like to replace with $(KUSTOMIZE) build $(KOPTIONS) manifests | operator-sdk generate bundle .... In short, we wanted to use the operator-sdk to get to this CSV.

What did you expect to see? Since the syndesis operator uses the service-account syndesis-operator and its operands use a couple of extra service-accounts syndesis-server and syndesis-public-oauthproxy, we have ClusterRoles and ClusterRoleBindings providing the relevant permissions for each one. Since the OLM installs the operator with ClusterAdmin privileges, it seemed sensible to add all of these service-accounts to the clusterpermissions section of our CSV.

What did you see instead? Under which circumstances? Having migrated to operator-sdk bundle generation, the ClusterRole permissions for syndesis-operator are included in the CSV but the ClusterRoles and ClusterRoleBindings for the other service-accounts are presented in their own resource files alongside the CSV. The problem is that the ClusterRoleBinding resource files specify a namespace in the subject portion, for the service-account, that is not appropriate for the user installing the operator.

To avoid this issue, I've refactored the ClusterRoleBindings for the additional service-accounts into the operator infrastructure, ie. the operator is now responsible for their installation and not the CSV. This has meant that the operator now requires the ClusterRoleBinding GET/UPDATE/DELETE permissions at the cluster-level.

This seems, architecturally, to be correct since:

  1. The operator is in charge of creating the operands and therefore their service-accounts;
  2. The namespace of the operands will only be known at the point of their installation and not when the operator is installed;

My reluctance (hence this question) concerns the necessary addition of privileges to the operator:

cdjohnson commented 3 years ago

We discussed this at the OLM Dev triage meeting this morning.

The CSV describes, the permissions (ClusterRole/Bindings and Role/Bindings) and ServiceAccount that the Operator requires. This is the Operator's Security Domain.

For Operands, you have two choices:

  1. Your Operator can dynamically create the ClusterRole/Bindings and Role/Bindings and ServiceAccounts for the Operand services when reconciling the Operand CR. With this option, the Operand's permissions must be a subset of the Operator's permissions, because Kubernetes will prevent privilege escalation.
  2. Your Operator can statically create the ClusterRole/Bindings and Role/Bindings and ServiceAccounts for the Operand services by including them in the bundle manifests directory. These will be created by OLM and can have a different Security Domain of the Operator (doesn't need to be a subset). However, these resources are only created in the Operator namespace. If you are supporting anything other than OwnNamespace mode, then these resources are not copied into the other namespaces in the OperatorGroup and you should consider option 1 instead.

Wearing my Security hat: Customers are increasingly wanting to have a narrow permissions granularity as possible. In my experience, they do not like AllNamespaces install mode because the Operator reaches into Namespaces (tenants). The only time this is acceptable is if the Operator is really designed to extend the entire cluster.

Customers also do not like ClusterRoleBindings since they also have cross-namespace or cluster-wide permissions.

That said:

  1. Create RoleBindings instead of ClusterRoleBindings wherever possible.
  2. The Operator's ServiceAccount and Operand ServiceAccount should be separate (different security domains)
  3. It's OK for each Operand instance (of the same Operator) in a namespace to share the same ServiceAccount.
  4. Create the Operand resources from the Operator dynamically for maximum flexibility and narrow security domain.
  5. Use the CSV ONLY for the Operator's permissions and serviceaccount. Not the Operand.
phantomjinx commented 3 years ago

Really useful information. Thanks very much.

durera commented 3 years ago

Your Operator can statically create the ClusterRole/Bindings and Role/Bindings and ServiceAccounts for the Operand services by including them in the bundle manifests directory. These will be created by OLM and can have a different Security Domain of the Operator (doesn't need to be a subset). However, these resources are only created in the Operator namespace.

How? What goes into the subjects[0].namespace field in bundle/manifests/xxx_v1_clusterrolebinding.yaml ... we've eliminated almost all our cluster role permissions as agree that goal should be everything stays in the namespace per your comments above, but still need a couple of cluster permissions.

Example: The source file in config/rbac/xxx_v1_clusterrolebinding.yaml currently has subjects[0].namespace: default, which kustomize will turn into subjects[0].namespace: whatever_i_set_namespace_to when bundle/manifests/xxx_v1_clusterrolebinding.yaml is created, so the resulting OLM bundle only works for installs to a single namespace called whatever_i_set_namespace_to.

Is there a specific value that this should be set to for when it's packaged into the olm bundle so that olm will replace it with the namespace the operator is being deployed to? Or is it a case of removing that field and olm will automatically add the missing subjects[0].namespace in?

cdjohnson commented 3 years ago

Good question. I personally never tried this, as I never recommend creating permissions this way.

From looking at the code: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/catalog/operator.go#L1778-L1791

It appears that the only way this would work is if the namespace and serviceaccount are well-known (hard-coded).

@dinhxuanvu do you have any background on the use case for creating a ClusterRoleBinding via the bundle manifests?

Looking at the public doc: https://olm.operatorframework.io/docs/tasks/creating-operator-manifests/

There is no official support for any RBAC manifests although the code applies them.

durera commented 3 years ago

From our view, we like the cleanliness of having the RBAC laid down up front by whatever does the deployment (be that OLM, or a traditional approach to deployment), rather than having to "roll up" all the permissions needed by the operands into the operator ontop of the permissions the operator actually needs.

By moving RBAC setup out of the operator we end up with an operator that has a much smaller set of permissions that are only what the operator itself needs, rather than a superset of everything needed by the operand(s). With the principle of assigning least permission everywhere, we think it makes sense to be able to restrict the operator in this way.

It appears that the only way this would work is if the namespace and serviceaccount are well-known (hard-coded).

Yeah, that was our conclusion too. We've ended up keeping namespace-scoped RBAC static/in OLM, and resorting to dynamic for the cluster-scoped permissions we couldn't lock down to the namespace.