nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code
https://kopf.readthedocs.io/
MIT License
2.15k stars 163 forks source link

Liveness probe failed in k8s v1.28 and 1.29 using kopf 1.37.3 #1140

Open ozlerhakan opened 1 week ago

ozlerhakan commented 1 week ago

Long story short

Hi @nolar ,

Our operator running on both k8s v1.28 and 1.29 has started failing the liveness probe at the startup after upgrading kopf to 1.37.3 along with Python 3.13. Switching back to 1.37.2 works as expected. There might be an incompatibility with the K8s version. I couldn't find a particular log from the output. I also tried the latest version of aiohttp==3.11.7 but it didn't help much.

Kopf version

1.37.3

Kubernetes version

1.28.13, 1.29.10

Python version

3.13

Code

No response

Logs

81s         Normal    Created             pod/operator
81s         Normal    Started             pod/operator
12s         Warning   Unhealthy           pod/operator-868f677cf5-7mscx    Liveness probe failed: Get "http://10.0.0.160:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
12s         Normal    Killing             pod/operator-868f677cf5-7mscx    Container operator failed liveness probe, will be restarted

Additional information

No response

ozlerhakan commented 5 days ago

Just a quick update on this issue: I tested Kopf 1.37.3 and it seems the liveness probe works as expected in 1.28, 1.29, and 1.30. However, it only works when there are no CRD objects in the k8s cluster. When upgrading Kopf in a cluster that already contains the corresponding CRD objects, the liveness probe doesn't fire at all, causing the operator to fail. It feels to me that something blocks the process at startup when CRD objects exist in the cluster. On the contrary, upgrading it in an empty cluster and then applying CRD objects one by one works well but it's not ideal for a cluster with many CRDs in place.