nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code
https://kopf.readthedocs.io/
MIT License
2.12k stars 161 forks source link

Excessive reconfiguring webhooks #889

Open joberdick-rcd opened 2 years ago

joberdick-rcd commented 2 years ago

Long story short

Activity in the cluster triggers the webbook to be reconfigured when it does not seem necessary. We observe many entries in the logs, even when the operator is not active working on anything. We are trying to find root cause.

Kopf version

1.35.3

Kubernetes version

1.21

Python version

3.9

Code

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    settings.watching.client_timeout = 60
    settings.persistence.finalizer = '{redacted}/kopf-finalizer'
    settings.admission.server = kopf.WebhookServer(certfile=f'{download_location}/client-cert.pem', pkeyfile=f'{download_location}/client-key.pem',port=8080,host='localhost')

    settings.admission.managed = {redacted}

Logs

[2022-01-28 22:04:33,146] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,155] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,202] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,211] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,229] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,237] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,251] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,267] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,276] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,284] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,295] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,303] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,317] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,332] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,341] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,350] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,357] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,368] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,377] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,388] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,429] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,437] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,460] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,471] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.
[2022-01-28 22:04:33,478] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook istio.redacted.
[2022-01-28 22:04:33,486] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook istio.redacted.

Additional information

The initial startup configures as expected. Webhooks are working as expected other than the excessive reconfiguring.

nolar commented 2 years ago

Hm. That is strange. Thanks for reporting.

The only suspicious place in this regard is WebhookContextManager & WebhookContextManagerMeta, which are the parents of kopf.WebhookServer. It is possible that they cause multiple results yielded instead of max one — and thus consumed in admission_webhook_server(). I never saw this happening before — but I will take a look.

joberdick-rcd commented 2 years ago

@nolar Thanks. If you can't repro, I can try to provide setup/steps.

nolar commented 2 years ago

@joberdick-rcd Yes, I was not able to reproduce it. Everything works fine for me.

I used K3s 1.21.9+k3s1, Python 3.9.7, Kopf 1.35.3. The virtualenv was originally clean and populated via pip install certbuilder -e ..

Can you please run kopf run with -v and share the logs? It might add some insights.

Also, can you clarify how are the SSL certs generated? Maybe something is wrong with the certs. I tried both ways: with the self-signed certs and with the same certs dumped and loaded from the file system. But if the certs are generated elsewhere, this can have an effect on something.

In particular, either that cert must be CA-capable, or the cadata=/cafile= must be specified to WebhookServer() explicitly. Kopf's generated certs are CA-capable (link). (I am not sure about the certs issued by the system-trusted CA, but that should work too.)

joberdick-rcd commented 2 years ago

@nolar I will work on getting you more info

joberdick-rcd commented 2 years ago

@nolar here are steps to reproduce:

  1. run this file(I installed cert-builder)
    
    import kopf

@kopf.on.startup() def configure(settings: kopf.OperatorSettings, **_): settings.admission.server = kopf.WebhookServer() settings.admission.managed = 'something.something'

@kopf.on.validate('something.something',persistent=True) def inputvalidate(spec, **_): if 'something' not in spec: raise kopf.AdmissionError("Missing required value: spec.something")

kopf run test.py -v

[2022-04-28 10:20:26,498] kopf._core.reactor.r [DEBUG   ] Starting Kopf 1.35.3.
[2022-04-28 10:20:26,498] kopf.activities.star [DEBUG   ] Activity 'configure' is invoked.
[2022-04-28 10:20:26,499] kopf.activities.star [INFO    ] Activity 'configure' succeeded.
[2022-04-28 10:20:26,499] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2022-04-28 10:20:26,500] kopf.activities.auth [DEBUG   ] Activity 'login_via_client' is invoked.
[2022-04-28 10:20:26,506] kopf.activities.auth [DEBUG   ] Client is configured via kubeconfig file.
[2022-04-28 10:20:26,506] kopf.activities.auth [INFO    ] Activity 'login_via_client' succeeded.
[2022-04-28 10:20:26,507] kopf._core.engines.a [INFO    ] Initial authentication has finished.
[2022-04-28 10:20:26,520] kopf._cogs.clients.w [DEBUG   ] Starting the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide.
[2022-04-28 10:20:26,522] kopf._kits.webhooks  [DEBUG   ] Generating a self-signed certificate for HTTPS.
[2022-04-28 10:20:26,662] kopf._kits.webhooks  [DEBUG   ] Listening for webhooks at https://*:42677
[2022-04-28 10:20:26,662] kopf._kits.webhooks  [DEBUG   ] Accessing the webhooks at https://localhost:42677
[2022-04-28 10:20:26,662] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:20:26,665] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.

2.  In another terminal download istioctl and install istio to same cluster

curl -L https://istio.io/downloadIstio | sh - export PATH=$PWD/bin:$PATH istioctl install -y



Note the extra logs as Istio installs:
[2022-04-28 10:26:09,439] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,447] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,494] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,502] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,509] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,514] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,520] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,524] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,541] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,549] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,555] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,558] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,562] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,565] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,598] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,602] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,605] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,608] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,619] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,623] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.
[2022-04-28 10:26:09,637] kopf._core.engines.a [INFO    ] Reconfiguring the mutating webhook something.something.
[2022-04-28 10:26:09,642] kopf._core.engines.a [INFO    ] Reconfiguring the validating webhook something.something.

I get the issue in k8s v1.21 in both EKS and Rancher Desktop. This issue exists if I create a server tunnel with my own certs also.
moznuy commented 2 years ago

I've got similar behavior in GKE Standard Cluster and managed to trace the source of my "Reconfiguring" every 30 seconds: (not 100% sure)

This is the place where admission server is setup once: https://github.com/nolar/kopf/blob/ab53ace82e62a6fa709bf5a580007eac1273ac34/kopf/_core/engines/admission.py#L272

And is waiting for changes here: https://github.com/nolar/kopf/blob/ab53ace82e62a6fa709bf5a580007eac1273ac34/kopf/_core/engines/admission.py#L363-L364

I looked at what else might call container.set() to disrupt this wait(): https://github.com/nolar/kopf/blob/ab53ace82e62a6fa709bf5a580007eac1273ac34/kopf/_cogs/aiokits/aiovalues.py#L38-L45

And for me it was this notify_all(): https://github.com/nolar/kopf/blob/ab53ace82e62a6fa709bf5a580007eac1273ac34/kopf/_core/reactor/observation.py#L184-L187

due to this: https://github.com/nolar/kopf/blob/ab53ace82e62a6fa709bf5a580007eac1273ac34/kopf/_core/reactor/running.py#L282-L285

and the events that trigger insights are modification of google CRDs with names *.gke.io. I saw at least three different names:

kubectl get crd -w confirmed the changes are happening.

I don't have a slightest idea why they are modified every 20 to 40 seconds or what the diffs are.

P.S. I got the diff and the only field that's changing is resourceVersion ... (at least for networkloggings.networking.gke.io)