Closed vincepri closed 4 years ago
Isn't this standard behavior with docker containers? (remaining stopped?)
kind does not run any daemon to mange the cluster, the commands create / delete "nodes" (containers) and run some tasks in them (like kubeadm init
), they're effectively "unmanaged".
docker restart
is not going to work, because creating a container is not just docker run
, we need to take a few actions after creating the container.
What is the use case for this? These are meant to be transient test-clusters and it's probably not a good idea to restart the host daemon during testing.
"Restarting" a cluster is probably going to just look like delete + create of the cluster.
I'm not sure I'd even consider supporting this so much of a bug as a feature, "node" restarts are not really intended functionality currently.
What is the use case for this?
+1 to this question.
docker restart in this case will act like a power grid restart on a bunch of bare metal machines. so while those bare metal machines might come back up, not sure if we want to support this for kind. for that to work i think some sort of state has to be stored somewhere...
I've been using kind locally (using Docker for Mac) and when docker reboots or stops, the cluster has to be deleted and recreated. I'm perfectly fine with it, just thought this might be something we should look into.
The use case was to keep the cluster around even after I reboot or shut down my machine / docker.
Thanks for clarifying - this is certainly a hole in the usability but I'd hoped that clusters would be cheap enough to [create, use, delete] regularly.
This might be a little non-trivial to resolve but is probably do-able. /priority backlog /help
@BenTheElder: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
I think I know how we can do this effectively, but I have no idea what to call the command that will fit with the rest of the CLI 🙃
cc @munnerz
Something like kind restart cluster
maybe?
restart
seems it fits well with the other create/delete cluster commands, what's the idea you had? Wondering if it actually fits the restart
word or it's something more.
It should roughly be:
--wait
for the control-plane like createIt'll look similar to create but skip a lot of steps and swap creating the containers for list & {re}start
We can also eventually have a very similar command like kind restart node
I like that approach, and the node restart also sounds nice and could cover other use cases.
/remove-kind bug /kind feature
Something like
kind restart cluster
maybe?
@BenTheElder I want to try it.
/assign
@tao12345666333: GitHub didn't allow me to assign the following users: tao12345666333.
Note that only kubernetes-sigs members and repo collaborators can be assigned. For more information please see the contributor guide
/lifecycle active thanks @tao12345666333
for the impatient, this seems to work for now after docker restarts:
docker start kind-1-control-plane && docker exec kind-1-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1'
FixMounts
has a few mount --make-shared, not sure if they are really required.
The make shared may not be required anymore, those are related to mount propagation functionality in kubelet / storage. It looks like with a tweak to how docker runs on the nodes we might not need those.
We should check with hack/local-up-cluster.sh
(IE @dims) on this as well, they have it still as well.
https://github.com/kubernetes/kubernetes/blob/07a5488b2a8f67add543da72e8819407d8314204/hack/local-up-cluster.sh#L1039-L1040
# configure shared mounts to prevent failure in DIND scenarios
mount --make-rshared /
I've also been thinking about ways we can make things like docker start
just work.
The /sys
remount is especially unfortunate, but I don't think we can do much about it easily because specifying a /sys
mount clashes with --privileged
(and we still need the latter).
:+1: for the new restart cluster
command!
The restart cluster
command will make kind the top of his class. Without it, it's a painful process to build test envs upon since restarting the whole process means re-downloading all the docker images from scratch, a lengthy process.
I will sent a PR next week. (
Looking forward! Is there any ticket for that, for tracking purposes?
not yet. I will update to the progress here.
tentatively tracking for 0.3
I have sent a PR #408 .
/subscribe
docker start
should ~work for single-node clusters, multi-node will require an updated #408 :sweat_smile:/subscribe
/subscribe
/subscribe
I created a cluster with kind create cluster
but docker stop kind-control-plane && docker start kind-control-plane
results in:
Initializing machine ID from random generator.
Failed to find module 'autofs4'
systemd 240 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization docker.
Detected architecture x86-64.
Failed to create symlink /sys/fs/cgroup/net_prio: File exists
Failed to create symlink /sys/fs/cgroup/net_cls: File exists
Failed to create symlink /sys/fs/cgroup/cpuacct: File exists
Failed to create symlink /sys/fs/cgroup/cpu: File exists
Welcome to Ubuntu Disco Dingo (development branch)!
Set hostname to <kind-control-plane>.
Failed to attach 1 to compat systemd cgroup /docker/f4818db97d67b00668cf91e203f2ebc0697210dd1bf6dddc82c866553bb3994c/init.scope: No such file or directory
Failed to open pin file: No such file or directory
Failed to allocate manager object: No such file or directory
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
Thanks for the data point @janwillies. This is definitely not actually supported properly (yet?) and would/will require a number of fixes, some of which are in progress. In the mean time we've continued to push to make it cheaper to create / delete and test with clean clusters. When 0.4 releases we expect kubernetes 1.14.X to start in ~20s if the image is warm locally.
I would like to add two things:
Running docker start minio-demo-control-plane && docker exec minio-demo-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1'
worked for me 👍
Please add restart
.
Running docker start minio-demo-control-plane && docker exec minio-demo-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1' worked for me 👍
On a recent version ( >= 0.3.0) it should just be docker start <node-name>
. The rest is handled in the entrypoint.
Please add restart.
We'd like to, but it's not quite this simple to do correctly. 🙃 That snippet doesn't work for multi-node clusters (see previous discussion around IP allocation etc.). For single node clusters currently it would just be an alias to docker start $NODE_NAME
. It's being worked on but is a bit lower priority than some Kubernetes testing concerns, ephemeral clusters are still recommended.
@carlisia As Ben said, we still recommend ephemeral clusters.
I am still trying to find out if there is a better way to improve #484 :cat:
@tao12345666333 I think ephemeral clusters are good but not in 100% of use cases. If you organise for example a workshop or a meetup, you would like to prepare everything in advance (some days before) and at the moment of the event, just spin up the cluster and that's it. Like I did many times with minikube. Another example would be doing experiments. If I'm working for example with Calico, Cilium, Istio or else I don't want to deploy them every time I need to run a simple test. It would be way easier to have many clusters and a time and spin up which you need and then stop it again. Do my samples make sense?
@bygui86 Yes, I understand this requirement very well.
In fact, I have done some work at #408 and #484 . It was available at the time, but it does not seem to be the best solution. (now it's a bit out of date)
I still focus my attention on the Docker network to find the optimal solution.
Thanks for the effort guys!!
As a partial workaround to speed up pods creation in a re-created cluster I mount containerd as volume to host machine, thus it survives cluster recreation and docker images are not downloaded every time after restart. e.g. I use following config for cluster creation:
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
extraMounts:
- containerPath: /var/lib/containerd
hostPath: /home/me/.kind/cache/containerd
Kind 0.5.1 When I restart my computer a running cluster seems to survive, it just has to be manually started:
# ... lets say we just booted our machine here ...
15:08:06 ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffbccdcd61a kindest/node:v1.15.3 "/usr/local/bin/entr…" 5 minutes ago Exited (130) 2 minutes ago kind-control-plane
15:08:11 ~$ docker start 5ffbccdcd61a
5ffbccdcd61a
15:08:20 ~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffbccdcd61a kindest/node:v1.15.3 "/usr/local/bin/entr…" 5 minutes ago Up 4 seconds 45987/tcp, 127.0.0.1:45987->6443/tcp kind-control-plane
15:08:39 ~$ kubectl get namespaces
NAME STATUS AGE
default Active 5m
foo Active 3m36s <------------ a namespace created prior to reboot
kube-node-lease Active 5m3s
kube-public Active 5m3s
kube-system Active 5m3s
So It looked like the container had received SIGINT signal (130-128=2) before the machine shutted down.
When I restart docker, or manually stop/start node container, or send SIGINT to the node, it never recovers and reports Exited (129)
or Exited (130)
before I try to start the container and Exited (255)
immediately after.
5:10:24 ~$ docker kill -s INT 5ffbccdcd61a
5ffbccdcd61a
15:10:52 ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffbccdcd61a kindest/node:v1.15.3 "/usr/local/bin/entr…" 8 minutes ago Exited (129) 3 seconds ago kind-control-plane
15:14:33 ~$ docker start -a 5ffbccdcd61a
INFO: ensuring we can execute /bin/mount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
systemd 240 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization docker.
Detected architecture x86-64.
Welcome to Ubuntu Disco Dingo (development branch)!
Set hostname to <kind-control-plane>.
Failed to bump fs.file-max, ignoring: Invalid argument
Failed to attach 1 to compat systemd cgroup /docker/5ffbccdcd61ab1271cc7f237cfb04fe529e2d08d211440e486f998a755882e43/init.scope: No such file or directory
Failed to open pin file: No such file or directory
Failed to allocate manager object: No such file or directory
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
15:14:35 ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffbccdcd61a kindest/node:v1.15.3 "/usr/local/bin/entr…" 12 minutes ago Exited (255) 1 second ago
Is there a way to manually stop/start the container, so that it would persist? Thanks
The main problem is that the container is not guaranteed to take the same IP that was assigned before the reboot, and that will break the cluster.
However, one user reported a working method in the slack channel https://kubernetes.slack.com/archives/CEKK1KTN2/p1565109268365000
cscetbon 6:34 PM
@Gustavo Sousa what I use :
alias kpause='kind get nodes|xargs docker pause'
alias kunpause='kind get nodes|xargs docker unpause'
(edited)
Thanks, I think I'm getting why this is not easy.
P.S.Pausing/unpausing seem to work until you stop/start the docker service, then it is the same problem again
This is on my radar, we've just had some other pressing changes to tackle (mostly around testing kubernetes, the stated #1 priority and original reason for the project) and nobody has proposed a maintainable solution to the network issues yet. I'll look at this more this cycle.
I could be wrong... feel free to delete/ignore/flame if a miss something or if I'm completely disconnected from the reality....
I didn't know the inner working of Kind. I've also go over the #484 details which seems to focus on DNS feature to solve the issue... but I'm unsure if that track has been investigate:
Did creating a custom bridge network with defined ip range and assign static ip(outside of that range) to container would not solve the ip persistence issue? also using a network name format it would enable the removing of the network (when deleting a cluster) without keeping track of its creation...
In the following example I keep the first 31 ip for the dhcp/auto assign/dns of docker and use the remaining IP [32-254] for manual assignation. since the ip address is manually assign and out of the auto assign range it would never been high-jacked by another container so ip address would survive reboot/container restart/etc.
docker network create --subnet 10.66.60.0/24 --ip-range 10.66.60.0/27 Kind-Net-[Clustername]
docker run --net Kind-Net-[Clustername] --ip 10.66.60.[32-254] ... NodeName
the good news with that is that multiple cluster would be network isolated (one net per cluster)... its also possible to subset logically that range (x.x.x.32-64->Ingress/loadbalancer, x.x.x.65-100->Control Plane, x.x.x101+ -> workplane)
its also possible to only use one bridge and put all node in the 200+ ip remaining in the selected scope... but for that it would be required to keep track of all currently deployed kind cluster node ip...
Source:https://docs.docker.com/engine/reference/commandline/network_connect/
Did creating a custom bridge network with defined ip range and assign static ip(outside of that range) to container would not solve the ip persistence issue? also using a network name format it would enable the removing of the network (when deleting a cluster) without keeping track of its creation...
Thanks for sharing your thoughts, the problem with this approach is that it requires to keep the state and implement and IPAM that persist after reboots :/
to me it seems tolerable if kind stores such state on disk.
k8s' IPAM is not part of k/utils though: https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/ipallocator/allocator.go
Unsubscribing as I'm getting so many pings from this thread, however I'm looking forward to 1.0 where this feature is scheduled to land. 👍
Local storage is fixed, working on this one again. /assign /lifecycle active
There is a network error after i restart the kindNode
kind version:0.6.2 Action: perform the cmd "reboot" in a container located in the kind cluster
The ip of the node is 172.17.0.2 after i restart the node but the kubelet cmd pram is (--node-ip=172.17.0.3)
On Wed, Dec 18, 2019, 18:58 Frankie notifications@github.com wrote:
There is a network error after i restart the kindNode
kind version:0.6.2 Action: perform the cmd "reboot" in a container located in the kind cluster
The ip of the node is 172.17.0.2 after i restart the node but the kubelet cmd pram is (--node-ip=172.17.0.3)
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/148?email_source=notifications&email_token=AAHADKZBIVZOZ6PNPH4TEDDQZLPMPA5CNFSM4GIFD5P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIHD6I#issuecomment-567308793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADK5A7DES7O5REXORHB3QZLPMPANCNFSM4GIFD5PQ .
Also invoking reboot in a kind node is a BAD idea, please don't do this.
Edit: to elaborate a bit ... Kind nodes share the kernel with your host. They are NOT virtual machines, they are privileged containers. Reboot is a kernel / machine level operation.
On Wed, Dec 18, 2019, 18:59 Benjamin Elder bentheelder@google.com wrote:
- This isn't supported yet.
- 0.6.2 is not a valid kind version??
On Wed, Dec 18, 2019, 18:58 Frankie notifications@github.com wrote:
There is a network error after i restart the kindNode
kind version:0.6.2 Action: perform the cmd "reboot" in a container located in the kind cluster
The ip of the node is 172.17.0.2 after i restart the node but the kubelet cmd pram is (--node-ip=172.17.0.3)
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/148?email_source=notifications&email_token=AAHADKZBIVZOZ6PNPH4TEDDQZLPMPA5CNFSM4GIFD5P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIHD6I#issuecomment-567308793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADK5A7DES7O5REXORHB3QZLPMPANCNFSM4GIFD5PQ .
Sorry ,its 0.6.1
But what happened was that the kind node stop and my system was not rebooted.
When docker restarts or stop/start (for any reason), the kind node containers remain stopped and aren't restarted properly. When I tried to run
docker restart <node container id>
the cluster didn't start either.The only solution seems to recreate the cluster at this point.
/kind bug