nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code
https://kopf.readthedocs.io/
MIT License
2.1k stars 158 forks source link

Example deployment configs for webhooks #785

Open sambhav opened 3 years ago

sambhav commented 3 years ago

Question

Currently there are is very little documentation on how to deploy an admission controller written in kopf. It would be nice to update https://kopf.readthedocs.io/en/stable/deployment/ with this information.

Currently I am struggling to figure out which pieces I need to set for an admission controller to be properly deployed as a K8s native app.

Checklist

Keywords

webhook, admission controller, deployment

sambhav commented 3 years ago

@nolar I would be happy to help add this to the docs, but I don't quite understand which parts kopf handles when deployed as a k8s native app.

nolar commented 3 years ago

I would be happy to help add this to the docs, …

I would be happy to review and comment on that! Though, I have a feeling that it will be a huge lengthy text worth a blog post or a series of posts or maybe a small book.

More on that, some "convenient" practices can be opinionated and considered less secure than needed. In this regard, I am not sure Kopf's documentation is a good place to give any recommendations on the security-related topics (this is out of the scope of the framework).

But at least I would be happy to add a link (or a few links) to such manuals or guides.

… which parts kopf handles when deployed as a k8s native app.

Can you please clarify the question?

Essentially, you need to expose a port where Kopf listens from the pod, then create a service pointing to that pod, and then create (manually) [Validating|Mutating]WebhookConfiguration objects pointing to that service. I thought of Kopf generating them (codenamed "Kopf SDK"), but didn't prioritise this feature yet.

Besides, generate SSL certificates either way (e.g. via cert-manager, openssl, or some other CA — depending on the company policies). Put the CA to caBundle. Put the SSL cert+private key to the operator (I guess, as Secrets mounted as a volume).

Create an on-startup handler that sets the webhook server with all these certs & ports — as per the existing documentation.

Optionally, add some health checks (liveness/readiness probes).

In some environments and companies, cluster-local Ingress objects can be used for HTTPS endpoints instead of self-served HTTPS servers in the pod (in that case, the pod's server will be simple HTTP).

In some other extremely secure environments, client-side authentication must be added — (client=K8s, server=webhook), so that the webhook server would know that it is called by proper K8s and not by something else — which can only be done in the cluster configs, i.e. requires admin privileges, not doable from the user side. This is optional, luckily.

I guess, something like this.

The same scenario applies to any webhook server, not necessary Kopf-based. This is why I think it can be a good blog post — with Kopf used only as an example. Technically, you can make webhook servers even with Flask or aiohttp or (maybe) Django or whatever serving HTTP.

Does it automatically generate the certs and update the webhook config?

Kopf does automatically generate self-signed SSL certs (which are also CAs for themselves) — but that is supposed to be used in the dev mode only. Self-signed SSL certs/CAs are not secure enough for production (regardless of the framework). Kopf supports arbitrary CA/SSL too — when provided via settings.

Kopf also manages the *WebhookConfiguration objects at runtime — but I'm not sure this is a good idea for production (there are some caveats). Though, it is less critical than proper proper SSL.

Does it need a service to be created?

If the webhook server runs inside of K8s, then a service is needed.

If outside, then any URL can work. Magic URLs of K8s pointing to ....service.cluster.local can also work.

As a starting point, see: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/

What config/settings should my webhook itself use? I wasn't sure which of the webhook servers were appropriate in this case.

Sorry, I didn't get this question.

sambhav commented 3 years ago

At the end - this is what I created for a minimal webhook config.

import os
from typing import Any, AsyncIterator, Dict

import kopf

class ServiceTunnel:
    async def __call__(
        self, fn: kopf.WebhookFn
    ) -> AsyncIterator[kopf.WebhookClientConfig]:
        namespace = os.environ.get("NAMESPACE")
        name = os.environ.get("SERVICE_NAME")
        service_port = int(os.environ.get("SERVICE_PORT", 443))
        container_port = int(os.environ.get("CONTAINER_PORT", 9443))
        server = kopf.WebhookServer(port=container_port, host=f"{name}.{namespace}.svc")
        async for client_config in server(fn):
            client_config["url"] = None
            client_config["service"] = kopf.WebhookClientConfigService(
                name=name, namespace=namespace, port=service_port
            )
            yield client_config

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    settings.admission.server = ServiceTunnel()
    settings.admission.managed = os.environ.get("WEBHOOK_NAME")

and the corresponding deployment config is -

---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: kopf
  name: defaulter
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: defaulter-role-cluster
rules:
  # Framework: admission webhook configuration management.
  - apiGroups: [admissionregistration.k8s.io, admissionregistration.k8s.io]
    resources: [validatingwebhookconfigurations, mutatingwebhookconfigurations]
    verbs: [create, patch]

  # Application: read-only access for watching cluster-wide.
  # You can put in the resources you want to default
  - apiGroups: [""]
    resources: [pods]
    verbs: [list, watch, patch, create, delete]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: defaulter-rolebinding-cluster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: defaulter-role-cluster
subjects:
  - kind: ServiceAccount
    name: defaulter
    namespace: kopf
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: defaulter-webhook
  namespace: kopf
  labels:
    application: defaulter-webhook
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      application: defaulter-webhook
  template:
    metadata:
      labels:
        application: defaulter-webhook
    spec:
      serviceAccountName: defaulter
      containers:
      - name: controller
        image: webhook
        imagePullPolicy: Always
        ports:
        - name: https-webhook
          containerPort: 9443
        env:
        - name: NAMESPACE
          value: kopf
        - name: SERVICE_NAME
          value: defaulter-webhook
        - name: SERVICE_PORT
          value: "443"
        - name: CONTAINER_PORT
          value: "9443"
        - name: WEBHOOK_NAME
          value: "defaults.kopf.io"
---
apiVersion: v1
kind: Service
metadata:
  name: defaulter-webhook
  namespace: kopf
spec:
  ports:
  - port: 443
    targetPort: 9443
  selector:
    application: defaulter-webhook

One other bug I discovered is that currently kopf uses the python function name to create the paths for webhook which contains underscores which is not a valid path value for service paths

'message': 'Invalid value: "/defaults_items": segment[0]: a DNS-1123 subdomain must consist of lower case alphanumeric characters, \'-\' or \'.\', and must start and end with an alphanumeric character (e.g. \'example.com\', regex used for validation is \'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*\')',
sambhav commented 3 years ago

Would others find that Tunnel I wrote useful as a default inside kopf?

nolar commented 3 years ago

which is not a valid path value for service paths… a DNS-1123 subdomain must

But this is not a subdomain, this is a URL path, isn't it? Can you please create a separate issue with an example where this happens? It looks like a bug for me — either Kopf's or K8s's bug.

Would others find that Tunnel I wrote useful as a default inside kopf?

As a default — no — because there are no defaults. The webhook configuration is highly specific to environments and non-technical policies of the companies where the operators run. So, Kopf provides tools and features, the developers configure them as they need.

As an option — maybe, not sure. I see no reasons why not. If added, it should match the usage style of other servers & tunnels nearby — specifically, configured via constructor args, not via env vars. And I suggest that you do del client_config["url"] instead of assigning None to it. And use the full DNS: ....svc.cluster.local — i.e. no shortcuts (they create unneseccary load on DNS). And some unit-tests are needed. And brief docs. — In that form, it can be added as a feature.

Alternatively — and as a quick solution — we can start a new page in the docs in the "Recipes" section, call it "Admission webhooks", put this example as a first recipe. Then, it does not need tests and docs and aligning the usage style, it can go "as is" — enough to outline the idea. It would be definitely valuable for Kopf users.

Regarding the name: I am not sure this is a "tunnel", as it does not forward the traffic through anywhere. It runs a local server, but registers it differently (not via a URL). I believe, "WebhookService" or "WebhookServiceServer" or something of that kind could be better names (sorry, I'm too distracted by the good summer weather now and my mind refuses to think about it ;-) ).

sambhav commented 3 years ago

which is not a valid path value for service paths… a DNS-1123 subdomain must

But this is not a subdomain, this is a URL path, isn't it? Can you please create a separate issue with an example where this happens? It looks like a bug for me — either Kopf's or K8s's bug.

Would others find that Tunnel I wrote useful as a default inside kopf?

As a default — no — because there are no defaults. The webhook configuration is highly specific to environments and non-technical policies of the companies where the operators run. So, Kopf provides tools and features, the developers configure them as they need.

Yup :) - I meant providing it as an option for users to use out of the box rather than having to write it on their own.

As an option — maybe, not sure. I see no reasons why not. If added, it should match the usage style of other servers & tunnels nearby — specifically, configured via constructor args, not via env vars. And I suggest that you do del client_config["url"] instead of assigning None to it. And use the full DNS: ....svc.cluster.local — i.e. no shortcuts (they create unneseccary load on DNS). And some unit-tests are needed. And brief docs. — In that form, it can be added as a feature.

Definitely, was planning on moving the arguments to the constructor if this was being added to kopf :)

Also, the full DNS doesn't really work, currently it uses the host to create the self signed certificate and if I put the full DNS it complains when the webhook is hit with -

x509: certificate is valid for localhost, defaulter-webhook.kopf.svc.cluster.local, not defaulter-webhook.kopf.svc

I could add it to extra_sans but the host would have to be the svc address. Also since the webhook is triggered using the service section of the [mutating|validating]webhook instead of url, I would imagine it uses the appropriate and efficient way of calling it?

Alternatively — and as a quick solution — we can start a new page in the docs in the "Recipes" section, call it "Admission webhooks", put this example as a first recipe. Then, it does not need tests and docs and aligning the usage style, it can go "as is" — enough to outline the idea. It would be definitely valuable for Kopf users.

Regarding the name: I am not sure this is a "tunnel", as it does not forward the traffic through anywhere. It runs a local server, but registers it differently (not via a URL). I believe, "WebhookService" or "WebhookServiceServer" or something of that kind could be better names (sorry, I'm too distracted by the good summer weather now and my mind refuses to think about it ;-) ).

Yup WebhookService sounds better :)

sambhav commented 3 years ago

But this is not a subdomain, this is a URL path, isn't it? Can you please create a separate issue with an example where this happens? It looks like a bug for me — either Kopf's or K8s's bug.

It might be a K8s convention and kopf bug

See https://github.com/kubernetes/kubernetes/blob/52eea971c57580c6b1b74f0a12bf9cc6083a4d6b/staging/src/k8s.io/apiserver/pkg/util/webhook/validation.go#L98

each part of the subpath must also be a DNS1123 Subdomain.

andreyhristov commented 3 years ago

Hi, is there a chance that this code gets into Kopf? I have very similar problem. WebhookServer is unusable (or maybe I can grok it) in the situation where the operator lives inside the k8s cluster and should handle the review requests. It uses URL in the client config when registering the CRD but as the K8s docs state: "url gives the location of the webhook, in standard URL form (scheme://host:port/path). The host should not refer to a service running in the cluster; use a service reference by specifying the service field instead. The host might be resolved via external DNS in some apiservers (e.g., kube-apiserver cannot resolve in-cluster DNS as that would be a layering violation). host may also be an IP address." So, this is unusable if one has a Service, which should be passed as namespace/service/path in clientConfig of a ValidatingWebhookConfiguration, as it can be seen here https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#service-reference. I have tried samj1912's code but then I errors for invalid "path"

kopf.clients.errors.APIError: ('ValidatingWebhookConfiguration.admissionregistration.k8s.io "my.example.com" is invalid: webhooks[0].clientConfig.service.path: Invalid value: "None/onicvalidate/spec": must start with a \'/\'', {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'ValidatingWebhookConfiguration.admissionregistration.k8s.io "my.example.com" is invalid: webhooks[0].clientConfig.service.path: Invalid value: "None/onicvalidate/spec": must start with a \'/\'', 'reason': 'Invalid', 'details': {'name': 'my.example.com', 'group': 'admissionregistration.k8s.io', 'kind': 'ValidatingWebhookConfiguration', 'causes': [{'reason': 'FieldValueInvalid', 'message': 'Invalid value: "None/onicvalidate/spec": must start with a \'/\'', 'field': 'webhooks[0].clientConfig.service.path'}]}, 'code': 422})

I have named my validating handler onicvalidate as if I use on_ic_validate then the k8s API complains about the underscores in the path. This has been already meantioned in this thread. Unfortunately, I cannot find where the None in the path expansion comes from. Thanks!

mehrdad-khojastefar commented 1 year ago

@samj1912 Thanks that was very helpful. @nolar I guess kopf has a lot of potential for supporting admission webhooks better. I was struggling with the same issue where our operator and admission controller were the same and they need to be in our cluster. I think it would be very good to at least have @samj1912 's solution in the official documentation for others because frankly kopf's docs for admission controllers are not that comprehensive. I would be pleased to help and develop some more common functionalities to enrich the kopf's admission controller, if there is anything in the project's roadmap.

nolar commented 1 year ago

@mehrdad-khojastefar Hi. Thanks. Any help with writing the documentation with realistic (but not excessive) examples is appreciated. Please mention a PR in this ticket.

There is no specific roadmap for developing Kopf. Well, no timeline at least. There is a roadmap, but not with new and better features, rather than with internal improvements:

Then, I will be happy to consider further steps, such as improving docs, popularizing the framework more, etc.

I have no idea how to improve it "big" & "for real" — it is rather feature-complete now. All suggestions are welcome (as separate discussions, issues, or feature requests)!

The "no timeline" part is because I am busy with my regular paid work, and we do not use Kopf there, so it now goes as a fancy side hobby project — whenever my spare time & energy permits. The summer was filled with friends & parties & travels & fun. The winter is coming — let's see how it goes.

mehrdad-khojastefar commented 1 year ago

Thanks for your response @nolar , actually I think I'm not the best person to fix the things that you've mentioned. But I've had some difficulties trying to setup admission webhook and some of them were features that the kubernetes support, for example I've noticed that we cannot assign multiple operations for one endpoint and connecting the webhook to a service was not that straightforward. These two things are bugging me and I can fix them if you'd like. I will be open to any suggestions.

nolar commented 1 year ago

@mehrdad-khojastefar Can you please elaborate with an example? I didn't fully get what you meant.

mehrdad-khojastefar commented 1 year ago

I will create an issue for it with examples and more explanation.

mehdi-kbj commented 7 months ago

Hello @mehrdad-khojastefar I'm stucking with same issue, I'm trying to understand how the kopf works with webhooks hence here what I catched: 1) You can create your own tunnel , prepare a webhook crd and initiate the Operator to bind 2) Managed mode, Operator will generate the webhooks instances for you ( like the example above)

Unfortunately it didn't work for me with both methods,

Here my code:

mport os
import kopf
import kubernetes
import yaml,logging 
import sys

# Set up logging to output to the console
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

import os
from typing import Any, AsyncIterator, Dict

import kopf
# First method
# @kopf.on.startup()
# def configure(settings: kopf.OperatorSettings, **_):
#     settings.log_format = logging.Formatter(fmt="%(asctime)s [%(levelname)s] %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
#     settings.log_level = logging.DEBUG
#     settings.admission.server = kopf.WebhookServer(
#         cafile='/etc/ca/ca.cert',        
#         certfile='/etc/webhook-certs/tls.crt',
#         pkeyfile='/etc/webhook-certs/tls.key',
#         addr='0.0.0.0', port=8443)

 # Second method
class ServiceTunnel:
    async def __call__(
        self, fn: kopf.WebhookFn
    ) -> AsyncIterator[kopf.WebhookClientConfig]:
        namespace = os.environ.get("NAMESPACE")
        name = os.environ.get("SERVICE_NAME")
        service_port = int(os.environ.get("SERVICE_PORT", 443))
        container_port = int(os.environ.get("CONTAINER_PORT", 8443))
        server = kopf.WebhookServer(port=container_port, host=f"{name}.{namespace}.svc")
        async for client_config in server(fn):
            client_config["url"] = None
            client_config["service"] = kopf.WebhookClientConfigService(
                name=name, namespace=namespace, port=service_port
            )
            yield client_config

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    settings.admission.server = ServiceTunnel()
    settings.admission.managed = os.environ.get("WEBHOOK_NAME")

Any insights about this subject will be helpful, the documentation is poor