Open alvinsw opened 3 years ago
podman doesn't handle restarts by design, it needs to use systemd files for managing containers on restarts.
https://github.com/containers/podman/blob/master/docs/source/markdown/podman-generate-systemd.1.md
Bear in mind that KIND wraps these containers technologies, if docker supports something out of the box and podman doesn't, is not likely that KIND is going to workaround it, it is by far out of scope of the project, however we work close and have a good relationship with both projects, collaborating and opening bugs if necessary.
Are you running podman as rootless? If podman support is "experimental", rootless is even "more experimental", so all the "advance" features may have bugs or simply not be supported at all ...
Podman also lacks a stable container network identifier which makes managing Kubernetes nodes across restarts problematic.
I don't think anyone is planning to work on this feature or has a plan for how it might be possible.
No, I am running podman as root. that is kind create cluster
is run by root user.
Minikube supports podman and it can still do cluster start and stop using podman.
What makes kind different in this case?
After executing podman start kind-control-plane
, can we just manually run a script on the running kind-control-plane container to start everything all over again?
Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster.
Minikube supports podman and it can still do cluster start and stop using podman.
Minikube supports podman and docker using a fork of the kind image yes.
What makes kind different in this case?
We don't work on that project. I don't work on podman support either. I can't tell you.
But I can tell you that podman lacks automatic restart for containers and lacks sufficient networking features to design robust restart. Node addresses will be random and restart support will be a roll of the dice. Stop and start is not what we mean when we say docker has restart support and has a different tracking issue that nobody has contributed to investigating this far. #1867
After executing podman start kind-control-plane, can we just manually run a script on the running kind-control-plane container to start everything all over again?
You're welcome to try but we have no such script.
Or would it be easier to add feature where all user data on kind-control-plane container is persisted in the host machine? This means if you delete and create cluster again, the new cluster will still have all the k8s objects from the previous cluster
Kubeadm doesn't support this AIUI. You can't just persist all data and then start a new cluster with it.
When stopping and starting or in docker restarting the data is persisted on any anonymous volume already. But not across clusters.
We are focused on making starting clusters cheap and quick so tests can be run from a clean state. We don't recommend keeping clusters permanently.
Thank you @BenTheElder for explaining things in the earlier post!
Do you think it will be (or maybe it is already) possible to declare required parameters in the config YAML file? Say if I wan to restart a multi-node cluster running on podman - in addition to the number of nodes I could declare static IP addresses per node... and so on. In other words if podman doesn't provide this functionality is there any way to allow users to make further configuration changes in order to compensate?
You have to use podman restart kind-control-plane
.
podman start
does not reattach the port forwarding.
Interestingly after an implicit stop, like rebooting, you have to start it and then restarting to make it work.
Hi @BenTheElder could you explain the "Node addresses will be random and restart support will be a roll of the dice"
I created an issue in Podman repository to be able to handle kind requirements but it's not clear what Kind is expecting from Podman side. https://github.com/containers/podman/issues/16797
Podman networking has changed a lot over the past few years but historically container IPs are random on startup and podman lacked an equivalent mechanism to docker's embedded DNS resolver with resolvable container names.
I don't think it's appropriate to file a bug against podman for kind unless there's a specific bug.
As you saw in #2998 the other reason we have't had a restart policy for podman is podman didn't support them meaningfully. That has changed a bit.
Hi @alvinsw
Even i'm also facing same error, after creating kind cluster using podman, when we are restring podman stop and start kind cluster not able to reach target endpoint. Almost we migrated docker podman around 1000 developers machine, this is something high priority. please let me if you get any workaround for this.
this is my support ticket - https://github.com/kubernetes-sigs/kind/issues/3473
Almost we migrated docker podman around 1000 developers machine, this is something high priority.
Unfortunately podman and docker are NOT direct substitutes and we don't have the bandwidth to spend on this ourselves currently.
In your issue, the containers are failing to start outright, at which point no kind code is even running, only podman/crun.
We'll continue to review suggested approaches to improving podman implementation in kind and the subsequent PRs.
Related: I think podman has had optional support for resolving container names for a while now, we could consider making this a pre-requesite and matching the docker behavior more closely.
I noticed that on a current podman setup the stop command goes over to a SIGKILL of the container. The systemd in the control-plane container waits by it self for a process (in my case containerd) that do not get stopped 1m30s - but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.
The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...
Better would be, to check the cause of containerd not returning immediately when stopped.
but the above podman stop command sends SIGKILL after 10s. Its obvious what that means.
That's not obvious to me, SIGKILL is not even the right signal to tell systemd to exit. https://systemd.io/CONTAINER_INTERFACE/
The args when creating the cluster/container could change the default of 10s for instance to 120s with the argument --stop-timeout=120. This would allow to shutdown the cluster gracefully ...
We could do that, it seems like a behavioral gap versus docker and we should investigate what the actual behavior difference is and try to align them.
Help would be welcome identifying what is happening with docker nodes that isn't happening with podman nodes (or perhaps you're running a workload that inhibits shutdown?)
Just to clarify, podman stop sends the signal that the container has configured (StopSignal) or the default SIGTERM. After the default timeout of 10s it sends the SIGKILL.
You are right, systemd/init containers should receive a different signal (37/SIGRTMIN+3). Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument. Looking into my control-plane container it looks like that kind (v0.23.0) does not set the right signal (--stop-signal=37) to stop systemd. But the systemd process does the shutdown also with the used SIGTERM signal, so far. Not sure if it would make a difference. A quick test with podman kill --signal=37 control-plane does not show one.
My current problem is that the shutdown hangs here, and continues after the systemd internal timeout (1min 30s):
...
[ OK ] Removed slice kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a…ntainer kubelet-kubepods-burstable-pod2954d591_64df_47ec_ac40_236a244177b6.slice.
[ OK ] Removed slice kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_9881…ntainer kubelet-kubepods-burstable-pod50dc3cdf_24ed_44a0_9d5d_988129d2591e.slice.
[ *** ] (2 of 2) Job cri-containerd-3f1ea75a93823c1ffaece11518a124ec8950fcbc7cf9cdaac6fd00c2a415e8dd.scope/stop running (47s / 1min 30s)
And this is just a kind test cluster (single node) with a deployment of httpd:latest (replicas: 2) - thats all.
Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...
Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.
We set this in the image.
Sum up: The --stop-timeout wouldn't hurt and provide a better experience from the point of the user. The --stop-signal=37 would made a contribution to comply with systemd. Missing part is the cause of the shutdown delay ...
It might not, but we should not set different flags in podman versus docker without understanding if we're working around a difference in functionality, on the surface they're supposed to be compatible and kind is less useful as a test tool when the behavior isn't consistent.
.. so before doing that, we want to understand if this is an expected difference in behavior, or if we're only working around a podman bug, or if it affects both and we're only mitigating podman but not docker.
So far, I have not seen clusters fail to terminate, which suggests a difference in behavior that is possibly a bug OR it's because of something you're running in the cluster (or something different with your host).
Ideally we'd reproduce and isolate which aspect (your config, your host, your workload, docker vs podman) is causing the nodes to not exit and deal with the root issue instead of changing the behavior of kind podman nodes to work around an issue we don't understand and haven't seen before.
Therefore the container creation (e.g. control-plane) should have a --stop-signal= argument.
We set this in the image.
Ooh, I don't know where I looked, definitely not at the right place ... its set :-)
What happened: Cluster does not work anymore after podman container is restarted (eg after host OS boot). The issue is fixed foe docker (https://github.com/kubernetes-sigs/kind/issues/148). Is there a plan to support restart for podman in the near future?
What you expected to happen: Cluster should run again after restarting podman container
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kind version
): 0.11.0kubectl version
): kindest/node:v1.21.1docker info
): podman version 3.1.2/etc/os-release
): Latest ArchLinux