kcl-lang / kcl-operator

Kubernetes KCL Operator and Webhook Server
Apache License 2.0
15 stars 4 forks source link

[Track] [Enhancement][WIP] KCL x Webhook Gatekeeper #35

Open Peefy opened 2 months ago

Peefy commented 2 months ago

Motivation

Kubernetes supports Webhook, RABC, builtin CEL policy, and other methods for permission control. However, in the process of using Webhook for resource management, there are some more complex permission management requirements. For example, we hope to manage the permissions of k8s webhook, which can mutate/validate specific resources. However, k8s itself does not set permission mechanisms for webhooks to constrain their scope, which may cause webhooks to affect resources that should not have been effective, thereby affecting cluster behavior.

User Story

As a Kubernetes cluster administrator, I hope to have fine-grained permission control over k8s webhooks to ensure that only specific webhooks can operate specific resources, and the granularity of permissions can be precise to a certain mutation and validation. I need a way to define and execute these fine-grained permission rules to prevent webhooks from accidentally affecting resources that should not be affected, leading to abnormal cluster behavior.

Goals

The specific scenario of the problem mainly includes these two parts, and the goal is to try to solve them

Goal 1: Panic in webhook

Goal 2: Webhook works fine, but there is bugs with its logic

Proposal

Goal 1: Panic in webhook

The behavior taken after Webhook failure depends on the specific requirements. Webhook resources should provide post-recovery policies, which users can freely choose according to specific usage scenarios. Further, users can also write post-recovery policies by themselves

apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
    failureAction: "abort"  # or "warn”, “skip”, or a function for more action based on needs
  source: < kcl code >

Goal 2: Webhook works fine, but there is bugs with its logic

  1. The RABC authority is inherited, and the mutation and validation resources of the account apply inherit the RABC authority of the account.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: restricted-namespace # The specific namespace
  name: restricted-role
rules:
- apiGroups: [""]
  resources: ["pods", "services"] # The specific resources
  verbs: ["get", "list", "watch", "create", "update", "delete"] # The specific action
  1. On the basis of RABC, the label selector is used to further fine-grained the scope of webhook validity.
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  selector: # select the object
    matchLabels:
      app: my-app
    namespace: my-namespace
    resourceKind: Pod
    resourceName: my-pod
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
  source: < kcl code >
  1. The bidirectional selection mechanism

The webhook resource describes the object it wants to work on

apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  selector:
    matchLabels:
      app: my-app
    namespace: my-namespace
    resourceKind: deployment
    resourceName: my-deployment
    strictmode: true
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
    failureAction: "abort"  # or "warn”, “skip”, or more action based on needs
  source: < kcl code >

Among the resources, describe those webhooks that are available to itself.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: nginx
  annotations:
    canMutate: [“conditionally-add-annotations”]
    canValidate:  [“conditionally-validate-annotations”]

Further detailed design is required:

  1. The correspondence between RABC verbs and validation, mutation webhook or custom verbs.
  2. The appropriate selector helps the webhook select the resource accurately.
  3. The appropriate selector helps the resource select the webhook accurately.

[WIP] Design Details

  1. Write Webhooks through KCL
params = option("params") or {} # hidden this for user 
set_func = lambda params {
    annotations: {str:str} = {k = v for k, v in params.annotations or {}}
    items = [item | {
        metadata.annotations: annotations
    } for item in option("items")]
}
items = set_func(params) # hidden this for user 
  1. RBAC binding to webhook
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mutater
  namespace: default 

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: mutater-role
  namespace: default 
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["*"]
---
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: set-annotations
spec:
  serviceAccountName: mutater
  params:
    annotations:
      config.kubernetes.io/local-config: "true"
  source: oci://ghcr.io/kcl-lang/set-annotations
  1. Debug/Test with KCL
test_set_func = lambda {
    ...
    item = set_func(param)
    assert item.annotation == 'kcl2'
}
  1. Extend Print or Provide builtin log

e.g.

Print("aaa", io.stdout)
log.SetLevel(Debug)
log.Print("aaa", io.stdout)
  1. Error Recovery

    apiVersion: krm.kcl.dev/v1alpha1
    kind: KCLRun
    metadata:
    name: set-annotations
    spec:
    serviceAccountName: mutater
    recoveryPolicy: panic # or skip
    params:
    annotations:
      config.kubernetes.io/local-config: "true"
    source: oci://ghcr.io/kcl-lang/set-annotations
  2. KCL as glue

req --> KCL-operator --> KCLRun --> KCL --> kclplugin --> go/py/rust...
               |--- Filters out the list of resources that 
               |--- the webhook can access based on RBAC

Community Tach

https://www.likakuli.com/posts/kinitiras-all/

Kubernetes Webhook

k8s webhook supports scoping when registering services

webhooks:
  - name: webhook-example.github.com
    clientConfig:
      service:
        name: webhook-example
        namespace: default
        path: "/mutate"                    
      caBundle: ${CA_BUNDLE}
    admissionReviewVersions: [ "v1beta1" ]
    sideEffects: None
    rules:                                  
      - operations: [ "CREATE" ]
        apiGroups: ["apps", ""]
        apiVersions: ["v1"]
        resources: ["deployments"] # Here !
    namespaceSelector:                      
      matchLabels:
        webhook-example: enabled # Here !

Kubernetes CEL Policy

k8s CEL specifies the object for which the rule takes effect

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingAdmissionPolicy
metadata:
  name: "demo-policy.example.com"
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups:   ["apps"]
      apiVersions: ["v1"]
      operations:  ["CREATE", "UPDATE"]
      resources:   ["deployments"] # Here !
  validations:
    - expression: "object.spec.replicas <= 5"

Specify a namespace using Binding

apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: "demo-binding-test.example.com"
spec:
  policyName: "demo-policy.example.com"
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        environment: test  # Here !

OPA Gatekeeper

OPA can create rules to prevent users from accessing the namespace

package kubernetes.admission
    operations = {"CREATE", "UPDATE", "DELETE"}

    deny[msg] {
        username := input.request.userInfo.username
        username == "user1"
        operations[input.request.operation]
        namespaces:= input.request.object.metadata.namespace]
        namespace == ns1
        msg := sprintf("Unauthorized: %v is not permitted to modify objects in namespace %v", [username, namespace])
    }

https://support.tools/post/opa-gatekeeper-require-labels/ https://stackoverflow.com/questions/71547292/opa-rego-policy-to-block-access-to-kubernetes-namespace

Kyverno

Kyverno can create rules to prevent users from accessing the namespace

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-create-in-forbidden-namespace
spec:
  background: false
  rules:
  - name: disallow-create-in-forbidden-namespace
    match:
      resources:
        kinds:
        - '*'
    exclude:
      namespaceSelector:
        matchNames:
        - forbidden-namespace
    mutate:
      overlay: {}
    validate:
      message: "Creating resources in the forbidden-namespace is not allowed."
      deny: true

Chainsaw

chainsaw: An end-to-end, declarative testing tool anyone can use to test Kubernetes operators.

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: example
spec:
  steps:
  - try:
    - assert:
        resource:
          apiVersion: apps/v1
          kind: Deployment
          metadata:
            name: coredns
            namespace: kube-system
          spec:
            replicas: 2

When asking Chainsaw to execute the assertion above, it will look for a deployment named coredns in the kube-system namespace and will compare the existing resource with the (partial) resource definition contained in the assertion. In this specific case, if the field spec.replicas is set to 2 in the existing resource, the assertion will be considered valid. If it is not equal to 2 the assertion will be considered failed.

[WIP] FluxCD Multi Tenancy

Flux defers to Kubernetes’ native RBAC to specify which operations are authorised when processing its custom resources. By default, this means operations are constrained by the service account under which the controllers run, which has the cluster-admin role bound to it. This is convenient for a deployment in which all users are trusted.

In a multi-tenant deployment, each tenant needs to be restricted in the operations that can be done on their behalf. Since tenants control Flux via its API objects, this becomes a matter of attaching RBAC rules to Flux API objects.

To give users control over the authorisation, the Flux controllers can impersonate (assume the identity of) a service account mentioned in the apply specification (e.g., the field .spec.serviceAccountName in a Kustomization object or in a HelmRelease object) for both accessing resources and applying configuration. This lets a user constrain the operations performed by the Flux controllers with RBAC.

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
 name: podinfo
 namespace: webapp
spec:
 serviceAccountName: webapp-reconciler
 interval: 5m
 chart:
   spec:
     chart: podinfo
     sourceRef:
       kind: HelmRepository
       name: podinfo

https://fluxcd.io/flux/components/helm/helmreleases/#role-based-access-control https://fluxcd.io/flux/installation/configuration/multitenancy/

KusionStack Controller Mesh

apiVersion: ctrlmesh.kusionstack.io/v1alpha1
kind: ShardingConfig
metadata:
  name: sharding-demo
  namespace: operator-demo
spec:
  controller:
    leaderElectionName: operator-leader
  webhook:
    certDir: /tmp/webhook-certs
    port: 9443
  selector:
    matchExpressions:
    - key: statefulset.kubernetes.io/pod-name
      operator: In
      values:
      - operator-demo-0

Reference

zong-zhe commented 2 months ago

July 17

  1. A webhook plugin manager
  2. Fluxcd multi-tenancy.
  3. After Controller Mesh panic.
  4. More examples
zong-zhe commented 1 month ago

How to provide a high quality webhook deployment ?

zong-zhe commented 1 month ago

Discussion User Story + More details kcl plugin + ref k8s prop