kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.01k stars 1.51k forks source link

Inhibitor support for graceful node shutdown #3648

Closed tanvp112 closed 2 weeks ago

tanvp112 commented 3 weeks ago

Hi,

I wanted to do some tests for graceful shutdown. Noticed that systemd-inhibitor was not included in the kind image. The command systemd-inhibit --list would return error due to no systemd --user is running.

Is this currently not supported or am I missing any configuration here?

Thanks.

stmcginnis commented 3 weeks ago

I'm not sure how that would work with kind.

Graceful shutdown happens when the host is shutting down. IIRC, kubelet gets registered with systemd-inhibitor so that when shutdown is happening, systemd will wait a given period of time for any registered processes to exit before continuing with the shutdown.

Since kind nodes are containers, they do not have their own systemd process, and shutting down is done by stopping the container. So it wouldn't ever hit this condition.

Can you explain a little more about what you are trying to test here?

tanvp112 commented 3 weeks ago

They do have systemd process but was reasonably slim down, I reckon systemd-inhibitor-locks can be added similarly? https://github.com/kubernetes-sigs/kind/blob/main/images/base/Dockerfile#L49

I am guessing system shutdown event can be handled? https://github.com/kubernetes-sigs/kind/blob/main/images/base/Dockerfile#L241

Example I have a long running pod on a spot node that about to be reclaim. I want to test to ensure the pod can stop within the time limit, including the volume attached (not the default local storage).

stmcginnis commented 3 weeks ago

Ah, sorry, I should have done a little more research before commenting. You are totally right.

It may be possible to add systemd-logind to the base image. That should pull in systemd-inhibit. I'm not sure when I can get to it, but I will try later. You could also try building your own base image with that modification, then create you own node image.

aojea commented 3 weeks ago

I don't know if kind is the right project to test node shutdown, @BenTheElder you are much knowledgeable on this area PTAL

tanvp112 commented 3 weeks ago

Graceful shutdown or non-graceful shutdown of node can be simulate with docker container stop kind-<name> --time <duration>, but it needs inhibitor lock to take advantage of the time allowance. Given default KIND image already has systemd running, what's the reason KIND is not the right project to test node shutdown?

aojea commented 3 weeks ago

kind does a lot of hacks to emulate a VM, are you sure all the behaviors triggered by a VM shutdown can be simulated with docker container stop? ... kind does not have dbus per example

tanvp112 commented 3 weeks ago

Yes. But I think the bigger issue here is DBus is needed for kubelet to obtain the lock inside the node.

aojea commented 3 weeks ago

so, it is worth it to add it to kind or just test this in a vm? the later seems more appropriate to me

tanvp112 commented 3 weeks ago

Why not? Note it is not impossible to add dbus to container. rke2 got this done for sometime already. Worth or not, guess it depends on how difficult to maintain for long run, certainly not because of dbus or systemd.

Feel free to close this request if graceful shutdown is too much a trouble that KIND doesn't see a value, you are certainly right that there are many other options out there.

aojea commented 2 weeks ago

We don't have lifecycle on the container, and the VM emulation fails in areas as the abstraction of the host resources... Adding something incomplete that will not provide full coverage seems too much cost for small roi ...

BenTheElder commented 2 weeks ago

As a general statement: System integration / node level isolation is going to be problematic with kind, because fundamentally we're sharing a host kernel. It's pretty OK for testing distributed behavior, manifests, controller logic, etc, but when we get into kubelet/node stuff it may or may not be appropriate. We try to enable these but it's sometimes pretty messy and it hasn't been the main focus. (see e.g. #1963 and the problems there)

Why not? Note it is not impossible to add dbus to container. rke2 got this done for sometime already. Worth or not, guess it depends on how difficult to maintain for long run, certainly not because of dbus or systemd.

I'm not that familiar with dbus, but this looks plausible to me. We'd have to investigate this more, it needs to be isolated from the other containers / host.

We don't have lifecycle on the container,

I think this is the bigger problem, we usually have users simulate node addition/removal using taints which is sufficient for testing most applications. For testing Kubernetes with disruptive node behaviors we typically still use cloud VMs where we have better isolation and control.

kind only has "create cluster" and "delete cluster" currently (and ... introducing other mutations would get pretty complex)

Inhibiting shutdown here would be somewhat at odds with any container deletion behavior (e.g. this thread https://github.com/kubernetes-sigs/kind/issues/2272#issuecomment-2142867867), but that's not necessarily a deal-breaker for tests if we inhibit for less time than the timeout, or simulate shutdown signal without actually calling kind delete cluster (maybe we signal systemd instead?)

More importantly: If you're writing tests for graceful shutdown in github.com/kubernetes/kubernetes, I'd say these should probably be under node_e2e and SIG node pretty much only supports node_e2e with kube-up.sh currently, to my knowledge.

I'd love to enable these someday, but I'm not sure if that's even something the SIG node maintainers are interested in, overall, and if not then we should probably continue to focus on cluster e2e (test/e2e).

If I've misread and you have another use case please elaborate 😅 .

BenTheElder commented 2 weeks ago

(also this week is KEP freeze amongst other things so please bear with response times ...)