kahkhang / kube-linode

:whale: Provision a Kubernetes/CoreOS cluster on Linode
MIT License
212 stars 31 forks source link

HA enhancements #66

Open camflan opened 6 years ago

camflan commented 6 years ago

It looks like there's at least 2 more areas of improvement needed for kube-linode,

Is this something that should wait until Terraform provisioning is in progress? Are these already possible?

kahkhang commented 6 years ago

Thanks for the feedback! Multiple masters is possible by tainting a worker node with a master taint (see https://github.com/kubernetes-incubator/bootkube/issues/311). There are still some problems that need to be addressed in order to have a HA deployment:

1) You'll need to have either a dedicated etcd cluster or deploy multiple etcd instances in different nodes. This was done and automated in previous commits of kube-linode with bootkube's self hosted etcd, but this feature has been depreciated by the bootkube team.

2) HA deployment of Traefik needs to be supported. This means storing the SSL certificates in a distributed manner, and putting a node balancer in front of all the worker nodes pointing to port 80. However, I don't think the node balancer supports automatic SSL renewal, either that or a service will need to be written to support changing the certificates whenever Traefik updates it.

NodePorts currently do not support low port numbers such as port 80 (see https://github.com/kubernetes/kubernetes/issues/9995), but a workaround can be done by listing the externalIPs (https://serverfault.com/questions/801189/expose-port-80-and-443-on-google-container-engine-without-load-balancer), which has some drawbacks since the externalIPs need to be updated dynamically, or by using a proxy service directly exposing host port 80 to port 80 of the pod network (see https://github.com/kubernetes/kubernetes/pull/10405). More discussion about this is detailed here: https://github.com/kubernetes/kubernetes/issues/9995.

In summary, there are multiple issues that need to be addressed to have a HA deployment, which can be easily addressed in other cloud providers through a LoadBalancer service, but unfortunately not available in Linode (unless someone writes a plugin for it). Another possible alternative is to deploy an external custom loadbalancer (see https://chmod666.org/2017/11/Hosting-a-self-made-Kubernetes-infrastructure-on-Scaleway), with a caveat to read Traefik's SSL certs hosted on an external distributed key-value store (see https://docs.traefik.io/user-guide/cluster/).

These are interesting challenges which I might try to attempt in the future (or if anyone reading this is inclined, feel free to take on this project). It's a good idea to use Terraform to automate this process, since I've come to realise bash scripts are somewhat messy to maintain. Thanks!

camflan commented 6 years ago

Wow, you're way ahead of me on this! I didn't know about tainting a worker for master, cool 👍

  1. Is the bootkube self-hosted feature deprecated, or self-hosting of etcd itself? If moving to terraform, can we simply install/host host etcd on the master nodes (assuming there is 3 or more)?
  2. Because rook is running, couldn't we use either rook object store or a rook fs storage pool to host the traefik SSL certs? Would we need the NodeBalancer to have certs on it, wouldn't traefik be able to handle it?

I didn't know that about NodePorts :/ I'll read more about the ExternalIP issues you posted.

I'm loving this project, it got our cluster up and running on Linode so that I could customize it. I had been messing around with my own k8s-linode project using Terraform but I couldn't get flannel or weave to come online using vxlan. My config was a bit more complicated, using Ubuntu as the base and then laying a VPN on top of the private networking interface. However, even without any firewall rules or VPN, I was still unable to get CNI working properly.

Anyways, you're use of coreOs let you avoid this issue entirely 👍 👍

Here's my project here: https://github.com/camflan/linode-k8s, feel free to use as much or as little as possible. I am happy to help with this project, I'd love a solid linode/kubernetes provisioning system to exist (if not only for selfish reasons :P), it seems like Linode gets left out of these projects, in favor of DigitalOcean.

kahkhang commented 6 years ago
  1. The self-hosting of etcd is no longer in active development (see https://github.com/kubernetes-incubator/bootkube/issues/738), but other parts of the k8s stack (except the kubelet service) is self hosted. I believe installing an odd number of etcd on the master nodes would work (though it probably should be kept small because of the consensus protocols going on)

  2. Yep that's a better idea, and possible, through first generating the certificates, then mounting the same PV in ReadOnlyMany mode across multiple Traefik pods.

Thanks for linking your project! I don't have any experience with Terraform yet, but would love to try it out some time :)

kahkhang commented 6 years ago

I've thought more about this and this HA setup is indeed possible, but we need some modifications:

  1. The K8S manifests/certificates should be generated outside of the cluster, then scp'ed in.
  2. We need to create an odd number of 3 or more masters.
  3. The server ip in the kubeconfig file needs to be replaced with an internal NodeBalancer ip pointing to the 3 master nodes.
  4. We need to bootstrap an odd number of 3 or more masters, all with etcd installed, with the kubeconfig file scp'ed in.
  5. We need another external NodeBalancer to forward traffic from outside to either only the masters, or all the nodes (running Traefik as a daemonset)

The internal NodeBalancer can be replaced with a similar setup as https://github.com/kubernetes-incubator/bootkube/pull/684. The total cost of the cluster will go up by at least $40/mo (2 extra 2gb instances + 1 NodeBalancer), but hopefully this will make it production ready.

I'm less motivated to embark on this because I did this for hobby purposes, but if someone is so inclined and reading this feel free to embark on this / give any feedback :)