Closed riker09 closed 4 years ago
I'm not sure if that is an issue with Maesh or K3S, but I was expecting that a Kubernetes cluster would survive a restart of the underlying OS.
Kubernetes is not designed to be able to handle that sort of situation, and in production it would be a complete rebuild if quorum was lost.
In this case, is the resolution broken on restart? A clean install works fine?
In this case, is the resolution broken on restart? A clean install works fine?
Yes, clean install works fine.
Kubernetes is not designed to be able to handle that sort of situation
How are people developing on Kubernetes clusters if I cannot restart the cluster node? I was under the impression that losing a node should be handled gracefully by Kubernetes. Can I work around this by having at least two nodes where one is always reachable (e.g. cloud provider) and the other is my local workstation (as you've mentioned quorum loss)? I apologize for asking all these newbie questions, but that's what I currently am, a complete Kubernetes noob. I'm learning as I'm going and I really appreciate all the answers I'm getting here. Thank you!
Hello @riker09
How are people developing on Kubernetes clusters if I cannot restart the cluster node?
Most of us use development cluster tools such as k3s for just this reason. K3s allows for new clusters to be quickly scaffolded. There is another tool k3d that is used exclusively to quickly build k3s clusters, and we use it in our own integration tests!
https://github.com/containous/maesh/blob/master/Makefile#L43
I was under the impression that losing a node should be handled gracefully by Kubernetes.
Correct, kubernetes can handle (n-1)/2 failures where n is the number of nodes (data and control plane separate). If your control plane loses more than half its nodes, the cluster is unrecoverable. Data nodes don't have the same quorum, but if you lose more data nodes than your workload, pods will be unscheduleable.
Can I work around this by having at least two nodes where one is always reachable (e.g. cloud provider) and the other is my local workstation (as you've mentioned quorum loss)?
Its not really that feasible for development. I would look at using another scaffolding tool like k3d to quickly rebuild your cluster if you break it. I personally use the k8s installation in docker-for-mac for development, but that too has to be rebuilt when I break stuff.
Okay, so the solution is recreating everything from scratch each time. I noticed that Persistent Volumes (and Claims) survive during a restart (guess that's the persistence everybody is talking about :slightly_smiling_face: ) and I can store data that I want to keep during reboots in a PV.
I have already looked at another scaffolding tool skaffold
. I will spend some time and investigate k3d
and see how they both compare. In the meantime I guess I will use a combination of helm
and kubectl
commands and do everything by hand until I'm more familiar with the whole Kubernetes universe.
Thanks again for your kind answers and taking your time for explaining everything.
No Problem. Therefore, I close this :)
Yesterday I did a full wipe of my local K3S installation. I have successfully tested the example afterwards. I could deploy my own Helm chart and use the Mesh, however I have encountered an issue (traffic was routed to the wrong service) and had to call it a day.
After starting my cluster today I verified that all pods have started successfully. The
whoami
example is not working anymore.I'm not sure if that is an issue with Maesh or K3S, but I was expecting that a Kubernetes cluster would survive a restart of the underlying OS.