RFC: multi-tenant control plane

grampelberg commented 6 years ago

Many k8s clusters are multi-tenant (for some definition of multi-tenant, probably requires more definition).

The list of functionality that should be isolated:

dashboard
stat
tap
inject

A user should only be able to see and interact with things that their roles allow them to.

What's the best way for us to work in these environments?

Restrict control plane credentials

Take advantage of (most) users isolating in kubernetes with namespaces. Install a control plane to a specific namespace and restrict its roles (maybe remove ClusterRoles?).

Delegate credentials

This is very similar to how the latest kubernetes-dashboard is working. By default, there is a base set of permissions that anyone can use (mostly nothing). When passed a user's token, the dashboard passes their permissions through and controls visibility that way.

briansmith commented 6 years ago

My understanding of this is:

We will fully support linkerd when installed in any non-system namespace. That is, it doesn't have to be installed in a "linkerd" namespace, but we won't support it being installed in the kube-system or kube-public namespace. We support users adding other objects to the same namespace (before and after installing linkerd) and we promise we won't create objects with names that would collide with their objects' names as long as their objects' names don't start (or end?) with "linkerd".
We will support a mode of operation where the user can only create objects within one namespace, where Conduit is installed in that namespace and all the injected pods are in that namespace. In particular, we'll support the case where the user can't create ClusterRoles and ClusterRoleBindings.
We will support a mode of operation where the controller is controlling pods outside of the namespace it is installed in. We will support such scenerios where somebody has to approve the use of linkerd on a namespace-by-namespace basis. For example, the linkerd CA needs to be able to create and update the CA bundle configmap in each namespace that has linkerd-injected pods. We will try to avoid needing to manage ClusterRoles and ClusterRoleBindings in this scenerio as well.
Multiple control planes can exist in the same cluster. We will provide ways to for control planes within a cluster to federate so that pods with different controllers can interop with each other. For example, we'd provide a way to configure control plane #1 to trust control plane #2 for all pods in a whitelist of namespaces A, B, C.

grampelberg commented 6 years ago

Sounds great to me for now, provides everything I've heard from users.

We will support a mode of operation where the controller is controlling pods outside of the namespace it is installed in. We will support such scenerios where somebody has to approve the use of linkerd on a namespace-by-namespace basis. For example, the linkerd CA needs to be able to create and update the CA bundle configmap in each namespace that has linkerd-injected pods. We will try to avoid needing to manage ClusterRoles and ClusterRoleBindings in this scenerio as well.

Is this to solve for an organization having a global control plane and wanting to force service owners to get access?

Multiple control planes can exist in the same cluster. We will provide ways to for control planes within a cluster to federate so that pods with different controllers can interop with each other. For example, we'd provide a way to configure control plane #1 to trust control plane #2 for all pods in a whitelist of namespaces A, B, C.

Is this because of how the CA works? Would it expand to multi-cluster "federation" as well?

briansmith commented 6 years ago

Is this to solve for an organization having a global control plane and wanting to force service owners to get access?

I expect that an organization will start with Conduit installed in one namespace to handle just that namespace's pods, then expand that one namespace at a time, while wanting to ensure that all other namespaces are unaffected by the change. So, in one sense, it enables that kind of gatekeeping, but I rather think about it as allowing incremental adoption while following the principle of least privilege.

Is this because of how the CA works? Would it expand to multi-cluster "federation" as well?

Right. If there are two independent control planes then they must have different private keys for their CAs; otherwise they're not really independent. Without some kind of federated trust mechanism, the pods controlled by controller A wouldn't trust the certificates used by pods controlled by controller B, since they wouldn't trust controller B's CA by default (if they did trust arbitrary CAs by default then that would defeat the purpose of using PKI in the first place).

briansmith commented 6 years ago

If we are going to assume that some users might be able to create objects only in one namespace, then any upgrade scenerio that might require them to run two controllers at once during the upgrade (e.g. due to a breaking API change) would require them to run two controllers in one namespace. That implies that we shouldn't use the controller namespace to identify a control plane, but instead we should use something else (e.g. something like the value we use for CONDUIT_PROXY_CONTROL_URL). This would cause us to need to change a lot of things.

grampelberg commented 6 years ago

Maybe not possible, but wouldn't we want to use the k8s native upgrade processes like patching deployments? (Or, dogfood our own upgrade logic).

briansmith commented 6 years ago

Maybe not possible, but wouldn't we want to use the k8s native upgrade processes like patching deployments?

It would be nice. We need to decide which upgrade processes we're going to support.

(Or, dogfood our own upgrade logic).

I don't know what this would mean.

grampelberg commented 6 years ago

re. our own upgrade logic, it would be nice to support upgrade patterns like blue/green. Using our built-in blue/green upgrade system for the control plane seems pretty nifty without knowing any details.

eikNamo commented 5 years ago

Any updates on this topic?

grampelberg commented 5 years ago

tap hardening has landed in the last couple edges. Metrics are on the roadmap but a low priority. As long as you don't mind folks having cluster wide visibility into metrics, it works multi-tenant today.

wmorgan commented 5 years ago

@eikNamo since multi-tenant is an ambiguous term, can you give us some more details about what you're looking for?

linkerd / linkerd2

RFC: multi-tenant control plane #1095

Restrict control plane credentials

Delegate credentials