mittwald / kubernetes-secret-generator

Kubernetes controller for automatically generating and updating secrets
Apache License 2.0
329 stars 56 forks source link

Pod startup failure #84

Open RomanOrlovskiy opened 1 year ago

RomanOrlovskiy commented 1 year ago

Describe the bug The pod is not able to start up during the initial deployment using the latest v3.4.0 helm chart. Is it possible this is related to the Kubernetes version?

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
.....
  Normal   Created    3m7s (x2 over 3m19s)   kubelet            Created container kubernetes-secret-generator
  Normal   Started    3m7s (x2 over 3m19s)   kubelet            Started container kubernetes-secret-generator
  Normal   Pulled     3m7s                   kubelet            Successfully pulled image "quay.io/mittwald/kubernetes-secret-generator:latest" in 74.165988ms (74.182482ms including waiting)
  Warning  Unhealthy  2m55s (x8 over 3m13s)  kubelet            Readiness probe failed: Get "http://10.8.11.223:8080/readyz": dial tcp 10.8.11.223:8080: connect: connection refused
  Warning  Unhealthy  2m55s (x6 over 3m13s)  kubelet            Liveness probe failed: Get "http://10.8.11.223:8080/healthz": dial tcp 10.8.11.223:8080: connect: connection refused
  Normal   Killing    2m55s (x2 over 3m7s)   kubelet            Container kubernetes-secret-generator failed liveness probe, will be restarted
  Normal   Pulling    2m54s (x3 over 3m19s)  kubelet            Pulling image "quay.io/mittwald/kubernetes-secret-generator:latest"

Those are the only logs available in pods:

{"level":"info","ts":1681322039.958661,"logger":"cmd","msg":"Operator Version: 0.0.1"}
{"level":"info","ts":1681322039.9587452,"logger":"cmd","msg":"Go Version: go1.15.15"}
{"level":"info","ts":1681322039.9587672,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1681322039.9587784,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1681322039.9592156,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1681322049.27793,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1681322049.2779632,"logger":"leader","msg":"Continuing as the leader."}

To Reproduce Just a basic installation using helm.

values.yaml:

installCRDs: true
useMetricsService: true

Environment:

kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:36:43Z", GoVersion:"go1.19", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.6-eks-48e63af", GitCommit:"9f22d4ae876173884749c0701f01340879ab3f95", GitTreeState:"clean", BuildDate:"2023-01-24T19:19:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
jan-kantert commented 10 months ago

We have seen this startup failure too. But only momentarily. It made no sense since "we did not change anything (tm)".

It later turned out that this happened because one of our apiservices became unavailable (in our case linkerd-tap because the pods ran into an issue). You can check with kubectl get apiservices.apiregistration.k8s.io. This did not affect any other workload on the cluster. I honestly do not understand why it causes secret-generator to hang. It definitely should not cause that.

Ideas?

jan-kantert commented 9 months ago

We looked some more into this issue. Seems to be a bug in the (old) version of operator-sdk. Guess an update would fix that.

vmartino commented 6 months ago

Is there a workaround for this issue?

jan-kantert commented 6 months ago

Workaround: Fix all of your webhooks ;-). This only happens when other webhooks are broken for us.