kubernetes-csi / livenessprobe

A sidecar container that can be included in a CSI plugin pod to enable integration with Kubernetes Liveness Probe.
Apache License 2.0
74 stars 98 forks source link

Don't exit the probe on connection issues #240

Closed jsafrane closed 9 months ago

jsafrane commented 9 months ago

What type of PR is this? /kind bug

What this PR does / why we need it: Do not exit the liveness probe process when it cannot connect to the CSI driver. The driver could be crashlooping, and we should not crashloop the liveness probe process too.

The process should only fail all probes to /healthz endpoint. Since the HTTP server is not running when connecting to the driver for the first time, "connection refused" must be a good enough failure.

This PR is heavily inspired / copied from https://github.com/kubernetes-csi/livenessprobe/pull/237

Which issue(s) this PR fixes:

Fixes #236

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Liveness probe process does not crash when it cannot access the associated CSI driver. It only fails all kubelet probes, most probably with "connection refused".
jsafrane commented 9 months ago

Tested with a CSI driver that's crashlooping. The liveness probe sidecar is still running and just restarts the driver container. "connection refused" is not great, but it fails the probe just fine:

  Warning  ProbeError  89s (x5 over 2m9s)    kubelet            Liveness probe error: Get "http://10.0.2.31:10300/healthz": dial tcp 10.0.2.31:10300: connect: connection refused
body:
  Warning  Unhealthy  89s (x5 over 2m9s)   kubelet  Liveness probe failed: Get "http://10.0.2.31:10300/healthz": dial tcp 10.0.2.31:10300: connect: connection refused
  Normal   Killing    89s                  kubelet  Container csi-driver failed liveness probe, will be restarted
jsafrane commented 9 months ago

I basically copied https://github.com/kubernetes-csi/livenessprobe/pull/237 and updated it with 0 timeout, it looks much better now.

xing-yang commented 9 months ago

/lgtm /approve

k8s-ci-robot commented 9 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ejweber, jsafrane, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-csi/livenessprobe/blob/master/OWNERS)~~ [jsafrane,xing-yang] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment