mittwald / kubernetes-secret-generator

Kubernetes controller for automatically generating and updating secrets
Apache License 2.0
329 stars 56 forks source link

Unable to perform a rolling upgrade #53

Closed lchdev closed 3 years ago

lchdev commented 3 years ago

Describe the bug Unable to perform a rolling upgrade: new pods are never marked ready because the liveness probe fails continuously.

To Reproduce

  1. Install with helm helm upgrade --install kubernetes-secret-generator mittwald/kubernetes-secret-generator --version 3.3.2
  2. Try to perform any upgrade, e.g. by changing a value helm upgrade --install kubernetes-secret-generator mittwald/kubernetes-secret-generator --version 3.3.2 --set secretLength=50

The deployment will try to create a new pod, but the container will enter a crash loop and never become ready:

$ kubectl get po
NAME                                           READY   STATUS             RESTARTS   AGE
kubernetes-secret-generator-6f79f56667-v8n7l   0/1     CrashLoopBackOff   6          4m41s
kubernetes-secret-generator-b55758744-hts96    1/1     Running            0          6m32s
$ kubectl describe po kubernetes-secret-generator-6f79f56667-v8n7l
Name:         kubernetes-secret-generator-6f79f56667-v8n7l
(...)
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  55s                default-scheduler  Successfully assigned default/kubernetes-secret-generator-6f79f56667-v8n7l to docker-desktop
  Normal   Pulled     52s                kubelet            Successfully pulled image "quay.io/mittwald/kubernetes-secret-generator:v3.3.2" in 1.3045537s
  Normal   Started    41s (x2 over 52s)  kubelet            Started container kubernetes-secret-generator
  Normal   Pulled     41s                kubelet            Successfully pulled image "quay.io/mittwald/kubernetes-secret-generator:v3.3.2" in 1.3209008s
  Normal   Pulling    31s (x3 over 54s)  kubelet            Pulling image "quay.io/mittwald/kubernetes-secret-generator:v3.3.2"
  Warning  Unhealthy  31s (x6 over 49s)  kubelet            Liveness probe failed: Get "http://10.1.0.11:8080/healthz": dial tcp 10.1.0.11:8080: connect: connection refused
  Warning  Unhealthy  31s (x6 over 49s)  kubelet            Readiness probe failed: Get "http://10.1.0.11:8080/readyz": dial tcp 10.1.0.11:8080: connect: connection refused
  Normal   Killing    31s (x2 over 43s)  kubelet            Container kubernetes-secret-generator failed liveness probe, will be restarted
  Normal   Pulled     30s                kubelet            Successfully pulled image "quay.io/mittwald/kubernetes-secret-generator:v3.3.2" in 1.2851037s
  Normal   Created    29s (x3 over 52s)  kubelet            Created container kubernetes-secret-generator

Container logs:

$ kubectl logs kubernetes-secret-generator-6f79f56667-v8n7l
{"level":"info","ts":1629819473.2746468,"logger":"cmd","msg":"Operator Version: 0.0.1"}
{"level":"info","ts":1629819473.2746885,"logger":"cmd","msg":"Go Version: go1.15.14"}
{"level":"info","ts":1629819473.2746935,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1629819473.2746956,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1629819473.276408,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1629819473.8457088,"logger":"leader","msg":"Found existing lock","LockOwner":"kubernetes-secret-generator-b55758744-hts96"}
{"level":"info","ts":1629819473.8563178,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1629819474.994521,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1629819477.3851807,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1629819481.9319456,"logger":"leader","msg":"Not the leader. Waiting."}

If I manually kill the old instance, the new pod is able to become the leader and to start successfully.

elenz97 commented 3 years ago

Hey @lchdev, the upstream helm chart now uses "Recreate" as deployment strategy which should result in upgraded deployments starting as expected.

I could successfully test this on a local cluster (using the upstream master codebase) by running:

$ helm install kubernetes-secret-generator deploy/helm-chart/kubernetes-secret-generator/.
$ helm upgrade --install kubernetes-secret-generator deploy/helm-chart/kubernetes-secret-generator --set secretLength=50
$ kubectl get po
NAME                                           READY   STATUS    RESTARTS   AGE
kubernetes-secret-generator-84946bf455-q8wjn   1/1     Running   0          30s

The deployment strategy might also be set via helm install [...] --set deploymentStrategy="Recreate".

I hope this fixes your issue! If there's anything else we can help you with, please let us know.