Closed chanwit closed 4 years ago
When I changed configuration to 1 master 3 workers, the cluster is able to operate. So the problem is narrowed down to the process of forming multi-master only.
I'm quite sure that the master joining process inside wks-controller
does not work properly.
I did many combinations of machines.yaml
, committed and pushed to check how the joining process worked. They ended up the same way. The 1st master was gone after the 2nd joined.
This works for me (on docker) see below... Does docker work for you? Also, is there any more you can tell me about your environment?
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node0 Ready master 37m v1.14.1 172.17.0.2 <none> CentOS Linux 7 (Core) 4.9.184-linuxkit docker://18.9.7
node1 Ready master 21m v1.14.1 172.17.0.3 <none> CentOS Linux 7 (Core) 4.9.184-linuxkit docker://18.9.7
node2 Ready master 10m v1.14.1 172.17.0.4 <none> CentOS Linux 7 (Core) 4.9.184-linuxkit docker://18.9.7
node3 Ready <none> 3m37s v1.14.1 172.17.0.5 <none> CentOS Linux 7 (Core) 4.9.184-linuxkit docker://18.9.7
wk-quickstart on make-TRACK-switchable [$?] on ☁️ us-east-1 took 2s
❯ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-fb8b8dccf-c4lth 1/1 Running 0 37m 10.32.0.5 node0 <none> <none>
kube-system coredns-fb8b8dccf-fvgnd 1/1 Running 0 37m 10.32.0.6 node0 <none> <none>
kube-system etcd-node0 1/1 Running 0 36m 172.17.0.2 node0 <none> <none>
kube-system etcd-node1 1/1 Running 0 21m 172.17.0.3 node1 <none> <none>
kube-system etcd-node2 1/1 Running 0 9m48s 172.17.0.4 node2 <none> <none>
kube-system kube-apiserver-node0 1/1 Running 0 36m 172.17.0.2 node0 <none> <none>
kube-system kube-apiserver-node1 1/1 Running 1 21m 172.17.0.3 node1 <none> <none>
kube-system kube-apiserver-node2 1/1 Running 1 10m 172.17.0.4 node2 <none> <none>
kube-system kube-controller-manager-node0 1/1 Running 1 36m 172.17.0.2 node0 <none> <none>
kube-system kube-controller-manager-node1 1/1 Running 0 20m 172.17.0.3 node1 <none> <none>
kube-system kube-controller-manager-node2 1/1 Running 0 10m 172.17.0.4 node2 <none> <none>
kube-system kube-proxy-5l6vr 1/1 Running 0 3m44s 172.17.0.5 node3 <none> <none>
kube-system kube-proxy-fzjmm 1/1 Running 0 21m 172.17.0.3 node1 <none> <none>
kube-system kube-proxy-rhqr7 1/1 Running 0 10m 172.17.0.4 node2 <none> <none>
kube-system kube-proxy-z8qx6 1/1 Running 0 37m 172.17.0.2 node0 <none> <none>
kube-system kube-scheduler-node0 1/1 Running 1 36m 172.17.0.2 node0 <none> <none>
kube-system kube-scheduler-node1 1/1 Running 0 20m 172.17.0.3 node1 <none> <none>
kube-system kube-scheduler-node2 1/1 Running 0 10m 172.17.0.4 node2 <none> <none>
weavek8sops flux-5675c5d88-djjq9 1/1 Running 0 37m 10.32.0.2 node0 <none> <none>
weavek8sops memcached-6bc6886f9f-sksx6 1/1 Running 0 37m 10.32.0.3 node0 <none> <none>
weavek8sops weave-net-22v2w 2/2 Running 0 21m 172.17.0.3 node1 <none> <none>
weavek8sops weave-net-295gb 2/2 Running 1 3m44s 172.17.0.5 node3 <none> <none>
weavek8sops weave-net-j8m2c 2/2 Running 0 37m 172.17.0.2 node0 <none> <none>
weavek8sops weave-net-kkq5n 2/2 Running 1 10m 172.17.0.4 node2 <none> <none>
weavek8sops wks-controller-8668fcbdb9-hkjjs 1/1 Running 0 37m 10.32.0.4 node0 <none> <none>
config.yaml:
# This file contains high level configuration parameters. The setup.sh script
# takes this file as input and creates lower level manifests.
# backend defines how the machines underpinning Kubernetes nodes are created.
# - docker: use containers as "VMs" using footloose:
# https://github.com/weaveworks/footloose
# - ignite: use footloose with ignite and firecracker to create real VMs using:
# the ignite backend only works on linux as it requires KVM.
# https://github.com/weaveworks/ignite.
backend: docker
# Number of nodes allocated for the Kubernetes control plane and workers.
controlPlane:
nodes: 3
workers:
nodes: 1
Also, could I see one of your machines.yaml
files?
I haven't tried it with Docker yet. I tested only with Ignite mode. I'll use Docker mode and get back to you.
My environment is Packet Bare Metal. Ubuntu 1804. Here's the repo I used to start the cluster. https://github.com/chanwit/firekube-profile-demo
You could also find generated machines.yaml file there: https://github.com/chanwit/firekube-profile-demo/blob/master/machines.yaml
@jrryjcksn confirmed that the joining process worked on Docker backend. My conclusion is now invalid.
Jerry, which quickstart repo you are using to up the cluster?
I tested with backend: docker
and the cluster up and working really fine.
There should be something wrong which I really don't understand. Thank you @jrryjcksn !
We have isolated this to just the ignite environment. It works with a docker backend and with baremetal ec2 machines.
Moving out of the current milestone. Will replan for a future sprint.
Fixed by PR: https://github.com/weaveworks/wksctl/pull/118
Changed estimate to 1 when pulling into the release.
When using wksctl to start a multi-master Firekube cluster, the cluster ended up like this when the 2nd or the 3rd master started to join the cluster. This failure is deterministic and always reproducible.
After that, kubectl could not connect to the API server any more.
Here's the
config.yaml
: