projectcapsule / capsule

Multi-tenancy and policy-based framework for Kubernetes.
https://capsule.clastix.io
Apache License 2.0
1.56k stars 153 forks source link

Allow specifying multiple nodeSelectors on a tenant #520

Open slushysnowman opened 2 years ago

slushysnowman commented 2 years ago

Describe the feature

We have a use-case where tenants can use 2 different node pools.

  1. A shared node pool available for use by all tenants on a cluster
  2. A private node pool only available for use by a specific tenant or group of tenants

The current nodeSelector specification on a tenant does not allow this use case (apart from some very creative use of labels on nodes) - currently we use Kyverno to enforce this, but it seems like Capsule would be an ideal place for this sort of policy.

It would be good if there was a mechanism on a tenant to be able to specify multiple node selectors, to allow tenants to use any number of separate node pools

What would the new user story look like?

When a tenant is created, multiple nodeSelectors can be specified

  1. What are the prerequisites for this?
  2. Tenant owner creates a new Namespace
  3. This is going to be attached to the Tenant
  4. All the magic happens in the background

Feel free to add a diagram if that helps explain things.

Expected behavior

A clear and concise description of what you expect to happen.

MaxFedotov commented 2 years ago

Hi @slushysnowman. As i remember, node selectors doesn't support OR operation (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), so it will be impossible to achieve. What you can do is to use taints and tolerations on this nodepools or allow user to create 2 different tenants - one using the private pool, another using shared

slushysnowman commented 2 years ago

It depends how Capsule checks I guess?

For example, if Capsule verifies the nodeSelectors on the pod spec against the multiple nodeSelectors specified on the tenant that should work right?

For example:

  1. Pod going to be deployed with nodeSelector specified
  2. Capsule assesses nodeSelector on pod
  3. Pod nodeSelector matches one of the nodeSelectors on tenant spec
  4. Pod is deployed

Or am I looking at it too simplistically?

MaxFedotov commented 2 years ago

Capsule uses built-in PodNodeSelector admission controller (https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#configuration-annotation-format). So basically whatever you add as nodeSelector to your TenantSpec would be added to this annotation and all magic behind the scene is done by Kubernetes itself.

slushysnowman commented 2 years ago

Yeah it makes sense that continuing to do it this way would make this feature request impossible - but potentially there are other ways that this functionality could be implemented.

If it's not desired to add this to Capsule, that's all good - currently we're using Kyverno for this, as Capsule's nodeSelector option wasn't fit for our purposes - we'd potentially like to swap to using Capsule for this, as it would probably be simpler to maintain, and seems like something that more users might run into in the long run, but I can understand if it's felt like this is not desired/needed functionality.

prometherion commented 2 years ago

@slushysnowman please, could you share the Kyverno rule you're using to allow a Pod running on one or more eligible node pools?

The idea is to evaluate their logic and check out if it's feasible translating it to Capsule.

slushysnowman commented 2 years ago

@prometherion apologies for delay, here it is:

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: node-restrictions
  annotations:
    policies.kyverno.io/title: Node restrictions
    policies.kyverno.io/category: xyz
    policies.kyverno.io/severity: high
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      We offers various nodes to deploy applications, these are for example labeled:
      `xyz: platform`, `xyz: shared`, `xyz: <tenant>-<purpose> for platform, shared
      and tenant specific nodes.
      `platform` nodes are designated for the platform services and are therefore not available to users.
      So the usage of the nodeSelector `xyz: platform` is disallowed.
      Subsequently Tenant nodes `xyz: <tenant>-<purpose>` are only available to their designated tenant
      By default if no nodeSelector is specified, shared is implied
spec:
  validationFailureAction: enforce
  background: true
  rules:
    - name: default-shared-nodeselector
      match:
        resources:
          kinds:
            - Pod
      mutate:
        patchStrategicMerge:
          spec:
            nodeSelector:
              +(xyz): shared
      exclude:
        #  Exempt infrastructure related namespaces in having a default node-selector assigned
        resources:
          namespaces:
            - ns1
            - ns2
    - name: disallow-non-tenant-nodes
      match:
        resources:
          kinds:
            - Pod
      preconditions:
        - key: "{{ request.object.spec.nodeSelector.xyz }}"
          operator: NotEquals
          value: "shared"
      context:
        - name: tenant # we derive our tenant from the namespace capsule created for our tenant
          apiCall:
            urlPath: "/api/v1/namespaces/{{ request.namespace }}"
            jmesPath: 'metadata.labels."capsule.clastix.io/tenant"'
      validate:
        message: "{{ tenant }} is not allowed to deploy using nodeSelector xyz: {{ request.object.spec.nodeSelector.xyz }} "
        deny:
          conditions:
            - key: "{{ request.object.spec.nodeSelector.xyz }}"
              operator: NotEquals
              value: "{{ tenant }}-*"
      exclude:
        #  Exempt infrastructure related namespaces from targeting tenant nodes
        resources:
          namespaces:
            - ns1
            - ns2
    - name: disallow-platform-nodes
      match:
        all:
          - resources:
              kinds:
                - Pod
      validate:
        message: "For hosting pods on xyz, a nodeSelectorLabel with the key `xyz` is required, but the value cannot be equal to `platform`. To host it on shared nodes use the value `shared`"
        pattern:
          spec:
            nodeSelector:
              xyz: "!platform"
      exclude:
        #  Exempt infrastructure related namespaces from targeting platform nodes
        resources:
          namespaces:
            - ns1
            - ns2
prometherion commented 2 years ago

I'm not a Kyverno expert, please correct me if I'm wrong.

Essentially, this policy is creating a validation pipeline made up of 3 steps.

  1. with the stage default-shared-nodeselector you're ​providing a default value in case the Pod doesn't have it.
  2. with the stage disallow-non-tenant-nodes you're validating only the Pods that got a value different from shared, retrieving the Tenant and checking if the user-provided nodeSelector is matching your criteria
  3. the final stage disallow-platform-nodes ensures that Pods are not scheduled on the platform nodes, I'd say the ones used by the platform itself, hosting critical components, such as API Server, Capsule, or any other backing service.
prometherion commented 2 years ago

@MaxFedotov I'd like to offer this enhancement to Capsule, overcoming the limitations of the PodNodeSelector that is not enabled by default, so we can decrease the number of add-ons required to run Capsule properly.

My idea is to create a new handler in the Pods webhook, iterating over the conditions of the enhanced Tenant node selectors until a matching one is found by the scheduler: do you see any drawback to this approach?

In the end, we would mimic the same behavior offered by Kyverno.

oliverbaehler commented 9 months ago

@prometherion Do we still want to consider this implementation? if we want to add this we could deliver it with 0.5.0. I can do the PR.

carpenterm commented 9 months ago

Removing the need for PodNodeSelector would be great. It's not available on EKS or other managed providers https://github.com/aws/containers-roadmap/issues/304