k8snetworkplumbingwg / multus-cni

A CNI meta-plugin for multi-homed pods in Kubernetes
Apache License 2.0
2.41k stars 584 forks source link

Thick plugin graceful termination #1338

Open dougbtv opened 2 months ago

dougbtv commented 2 months ago

This PR introduces graceful shutdown functionality to the Multus daemon by adding a /readyz endpoint alongside the existing /healthz. The /readyz endpoint starts returning 500 once a SIGTERM is received, indicating the daemon is in shutdown mode. During this time, CNI requests can still be processed for a short window. The daemonset configs have been updated to increase terminationGracePeriodSeconds from 10 to 30 seconds, ensuring we have a bit more time for these clean shutdowns.

This addresses a race condition during pod transitions where the readiness check might return true, but a subsequent CNI request could fail if the daemon shuts down too quickly. By introducing the /readyz endpoint and delaying the shutdown, we can handle ongoing CNI requests more gracefully, reducing the risk of disruptions during critical transitions.

Major thanks to @deads2k for the find, identification, fix, and of course, the explanations. Appreciate it.

coveralls commented 2 months ago

Coverage Status

coverage: 63.822% (-0.04%) from 63.857% when pulling 531dec1c916d746aabf3ad800803ee0a82c8a11b on dougbtv:thickplugin_graceful_term2 into f1e887e2396c98e9aee6417723f2c5cd433a1cd2 on k8snetworkplumbingwg:master.