Closed jsafrane closed 9 months ago
Tested with a CSI driver that's crashlooping. The liveness probe sidecar is still running and just restarts the driver container. "connection refused" is not great, but it fails the probe just fine:
Warning ProbeError 89s (x5 over 2m9s) kubelet Liveness probe error: Get "http://10.0.2.31:10300/healthz": dial tcp 10.0.2.31:10300: connect: connection refused
body:
Warning Unhealthy 89s (x5 over 2m9s) kubelet Liveness probe failed: Get "http://10.0.2.31:10300/healthz": dial tcp 10.0.2.31:10300: connect: connection refused
Normal Killing 89s kubelet Container csi-driver failed liveness probe, will be restarted
I basically copied https://github.com/kubernetes-csi/livenessprobe/pull/237 and updated it with 0
timeout, it looks much better now.
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: ejweber, jsafrane, xing-yang
The full list of commands accepted by this bot can be found here.
The pull request process is described here
What type of PR is this? /kind bug
What this PR does / why we need it: Do not exit the liveness probe process when it cannot connect to the CSI driver. The driver could be crashlooping, and we should not crashloop the liveness probe process too.
The process should only fail all probes to
/healthz
endpoint. Since the HTTP server is not running when connecting to the driver for the first time, "connection refused" must be a good enough failure.This PR is heavily inspired / copied from https://github.com/kubernetes-csi/livenessprobe/pull/237
Which issue(s) this PR fixes:
Fixes #236
Special notes for your reviewer:
Does this PR introduce a user-facing change?: