xetys / hetzner-kube

A CLI tool for provisioning kubernetes clusters on Hetzner Cloud
Apache License 2.0
744 stars 116 forks source link

Cluster upgrades - any consideration? #87

Open kabudu opened 6 years ago

kabudu commented 6 years ago

This is a fantastic tool that you've put together. :) I would like to know if any consideration has been given to cluster upgrades in hetzner-kube. i.e. A new version of Kubernetes has been released and needs to be rolled out to the master node and workers.

Having used GKE for a while, from what I understand of the process for worker nodes is: a) new nodes are spun up, b) workloads drained from existing nodes and migrated to the new nodes c) old nodes are killed. This is performed on a rolling basis. I'm not sure on the specifics of the process for the master node, but hopefully I've conveyed my point well enough.

Thanks.

xetys commented 6 years ago

I made a look on the docs at this topic and figured out that kubeadm currently doesn't support upgrading HA clusters with external etcd...

in this case, an upgrade should be either performed manually (by hetzner-kube, but doing own upgrade logic) or to wait until this gets resolved at kubeadm. The first could be done earlier, but means a lot of work, and won't be up-to-date as kubeadm is. WDYT?

kabudu commented 6 years ago

@xetys I agree with your analysis. It's probably best to wait for kubeadm, and hopefully the issue will be resolved soon.

xetys commented 6 years ago

I ran into the need to perform an upgrade of my cluster. As I only use HA clusters in my setups, this is a non-trivial step. Here are my steps to do a backup:

Backup

Depends on how paranoid you are, you should save:

Control Plane

  1. get the latest kubeadm
export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt)
curl -sSL https://dl.k8s.io/release/${VERSION}/bin/linux/amd64/kubeadm > /tmp/kubeadm
chmod a+rx /tmp/kubeadm
mv /tmp/kubelet /usr/bin/kubeadm
  1. upgrade the control plane using the generated hetzner-kube on all master nodes
kubeadm alpha phase controlplane scheduler --config master-config.yaml 
kubeadm alpha phase controlplane controller-manager --config master-config.yaml 
kubeadm alpha phase controlplane apiserver --config master-config.yaml 

if you are using the kube-prometheus addon, add these lines to /root/master-config.yaml:

controllerManagerExtraArgs:
  address: 0.0.0.0
schedulerExtraArgs:
  address: 0.0.0.0

warning: think about firewall rules when opening these

kubelet

upgrade kubelet on all nodes


export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt)
curl -sSL https://dl.k8s.io/release/${VERSION}/bin/linux/amd64/kubelet > /tmp/kubelet
chmod a+rx /tmp/kubelet
mv /tmp/kubelet /usr/bin/kubelet
systemctl restart kubelet

Community question:

@kabudu @JohnnyQQQQ @pierreozoux @eliasp

We could move this logic into the tool. But as mentioned above, cluster upgrades are a complicated topic. This solution currently relies on alpha tools from kubeadm. There is no fixing of versions. It just installs the latest release. I don't know what is the best thing to do:

  1. add this to the docs, don't provide any automatism into the tool
  2. doing a simple upgrade routine under alpha subcommand, like kubeadm. This would look like `hetzner-kube cluster alpha upgrade [plan|apply]
  3. the ultimate upgrade feature, beginning from 2. extending with a generic version fixing, adding specific upgrade for minor version. This would lead to the most stable upgrade experience, but the contributers should maintain new upstream version from kubernetes...the "lot of work" I mentioned earlier

WDYT?

kabudu commented 6 years ago

Option 2 seems like a good halfway house to me.

JohnnyQQQQ commented 6 years ago

Personally I think that we should start with option 1 right now and take option 2 into consideration for the next major release.

We would need to get a list of all common edge-cases in which an update is not possible and catch those exceptions. IMHO If we would like to ship a feature like this, it shouldn't harm the more inexperienced users in first place.


There is no fixing of versions

This is a good point. We should check if the new versions found by export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt) is already supported (by the upgrade routine) or not.

xetys commented 6 years ago

about the "next major" release...currently, we have no major release, yet. The next would then be 1.0. I'm still thinking what milestones I want to reach for this version.

So, who is gonna code this? :trollface:

lakano commented 5 years ago

Hello! Kubeadm support upgrading with external etcd since May. So, this should help for hetzner-kube upgrade ? We follow your project since many months, but we would like to be sure if we install it in production, we could upgrade it after. Or may be it's still not recommended to uses hetzner-kube in production ?

md2k commented 5 years ago

@xetys , I'm, most probably, can help with this, since i have plans to use this tool for production. so going to do some investigation about how to achieve that. also maybe then it will be not bad idea to add functionality to select version of kubernetes to deploy (optional and maybe overthinking)

P.S. we currently rolling kubernetes (with AWS before EKS) with ansible and upgrades also done with ansible, but etcd in our setup is external, never used kubeadm to work with clusters. also upgrades for Kubernetes should be done in stages current your version to latest patch of your major release -> next stable major release -> next major release otherwise it can lead to multiple issues because each major release includes migration schemas only for previous release. so if we want upgrade from 1.9 to 1.13 we need do this 1.9->1.10->1.11->1.12->1.13, did multiple tests with our clusters before, and when we tried upgrade from 1.9 to 1.11 we had plenty of issues, while step by step upgrade did a trick, also need update controller scheduler proxy before initiate upgrade for next release.

MattiasGees commented 4 years ago

There is now a kubeadm upgrade apply <kubernetes-versio>. I think those can be implemented into the current workflow. I am thinking about some possible designs on how to do this with hetzner-kube and will keep this issue updated with some ideas.

More information about it.

xetys commented 4 years ago

There are also straight forward ways how to do clean upgrade since 1.13. We also could support this. It's just a big piece of work