Closed mbert closed 6 years ago
@jamiehannaford The problem when using Amazon's ELB is that it doesn't provide a single, stable IP address, so there is no such LB IP that I can make use of (see https://stackoverflow.com/a/35317682/7131191).
So for now the workers join via the ELB's FQDN, which will forward it to one of the apiservers, which, since it advertises its own IP address, makes the worker configure its kubelet to use that IP address (and not the ELB FQDN). Therefore, to make sure that the kubelet goes through the apiserver load-balancer the kubelet.conf
needs to be patched afterwards with the ELB FQDN and the kubelet restarted.
I've just open sourced out stab on HA kubeadm. Comes with a few caveats and ugly workaround (especially the kube-proxy hack is ugly). But it works: https://github.com/itskoko/kubecfn
I have done some work on the HA setup guide on google docs:
Those changes have been implemented in my ansible-based automation of the described process, plus some more:
I've published the kubeadm-based HA kubernetes installer script I've been working on lately. It will hopefully put my prior comments into context and serve as one concrete example of how to automate the steps of @jamiehannaford's HA guide, which it follows fairly closely.
It's a python script that executes in two phases: render
which creates "cluster assets" in the form of SSH keys, certs, and bootscripts, and an install
phase which executes those bootscripts over SSH.
The scripts have been tried out on a local Vagrant cluster and against AWS. Two "infrastructure provider scripts" are included in the repo (vagrant and AWS via Terraform) to provision the necessary cluster load-balancer and VMs.
Feel free to try it out. https://github.com/elastisys/hakube-installer
I have not yet found a way to upgrade a HA cluster installed using kubeadm and the manual steps described in my HA setup guide on google docs.
What I have tried so far is the following:
This did not work, and the result was pretty much the same in all cases. What I get in the secondary masters' logs looks like this:
Unable to register node "master-2.mylan.local" with API server: nodes "master-2.mylan.local" is forbidden: node "master-1.mylan.local" cannot modify node "master-2.mylan.local"
Failed to update status for pod "kube-apiserver-master-2.mylan.local_kube-system(6d84ab47-0008-11e8-a558-0050568a9775)": pods "kube-apiserver-master-2.mylan.local" is forbidden: node "master-1.mylan.local" can only update pod status for pods with spec.nodeName set to itself
Failed to update status for pod "kube-controller-manager-master-2.mylan.local_kube-system(665da2db-0008-11e8-a558-0050568a9775)": pods "kube-controller-manager-master-2.mylan.local" is forbidden: node "master-1.mylan.local" can only update pod status for pods with spec.nodeName set to itself
Failed to update status for pod "kube-scheduler-master-2.mylan.local_kube-system(65c6a0b3-0008-11e8-a558-0050568a9775)": pods "kube-scheduler-master-2.mylan.local" is forbidden: node "master-1.mylan.local" can only update pod status for pods with spec.nodeName set to itself
Failed to update status for pod "kube-flannel-ds-ch8gq_kube-system(47cccaea-0008-11e8-b5b5-0050568a9e45)": pods "kube-flannel-ds-ch8gq" is forbidden: node "master-1.mylan.local" can only update pod status for pods with spec.nodeName set to itself
Failed to update status for pod "kube-proxy-htzg7_kube-system(47cc9d00-0008-11e8-b5b5-0050568a9e45)": pods "kube-proxy-htzg7" is forbidden: node "master-1.mylan.local" can only update pod status for pods with spec.nodeName set to itself
Deleting mirror pod "kube-controller-manager-master-2.mylan.local_kube-system(665da2db-0008-11e8-a558-0050568a9775)" because it is outdated
Failed deleting a mirror pod "kube-controller-manager-master-2.mylan.local_kube-system": pods "kube-controller-manager-master-2.mylan.local" is forbidden: node "master-1.mylan.local" can only delete pods with spec.nodeName set to itself
Failed creating a mirror pod for "kube-controller-manager-master-2.mylan.local_kube-system(78432ebfe5d8dfbb93f8173decf3447e)": pods "kube-controller-manager-master-2.mylan.local" is forbidden: node "master-1.mylan.local" can only create pods with spec.nodeName set to itself
[... and so forth, repeats itself ...]
Has anybody got a hint how to proceed in getting the secondary masters upgraded cleanly?
@mbert This seems like an RBAC issue. Did you ensure the node name matches the hostname-override?
Also, did you reset etcd for each step? That probably explains why you saw the same result.
@jamiehannaford I am not using any hostname override, neither in kubelet nor in the kubeadm init configuration. And, yes, I am resetting etcd, i.e. tear down the cluster, install a new one from the scratch, then try to upgrade it.
I'll give setting a hostname-override for kubelet a shot and see whether this leads to any other result.
It seems like setting hostname-override when setting up the cluster helps, i.e., makes the secondary masters upgradable. Once this has become a standardised procedure I will document it in the HA setup guide in google docs.
Hi @mbert and others - From the past year or so, I have several k8s clusters (kubeadm and otherwise) driven from Cobbler / Puppet on CoreOS and CentOS. However, none of these has been HA.
My next task is to integrate K8s HA and I want to use kubeadm. I'm unsure whether to go with the @mbert's HA setup guide or @jamiehannaford's HA guide.
Also - this morning I read @timothysc's Proposal for a highly available control plane configuration for ‘kubeadm’ deployments. and I like the "initial etcd seed" approach he outlines. However, I don't see that same approach in either @mbert or @jamiehannaford's work. @mbert appears to use a single, k8s-hosted etcd while @jamiehannaford's document documents the classic approach of external etcd (which is exactly what I have used for my other non-HA POC efforts).
What do you all recommend? External etcd, single self-hosted, or locating and using the "seed" etcd (with pivot to k8s-hosted)? If the last - what guide or documentation do you suggest?
TIA!
@andybrucenet External etcd is recommended for HA setups (at least at this moment in time). CoreOS has recently dropped support for any kind of self-hosted. It should only really be used for dev, staging or casual clusters.
@andybrucenet Not quite - I am using an external etcd cluster just like @jamiehannaford proposes in his guide. Actually the approaches described in our respective documents should be fairly similar. It is based on setting up the etcd cluster you feel you need and then have kubeadm use it when bootstrapping the Kubernetes cluster.
I am currently more or less about to finish my guide and the ansible-based implementation by documenting and implementing a working upgrade procedure - that (and some bugfixes) should be done sometime next week.
Not quite sure whether there will be any need to further transfer my guide into yours, @jamiehannaford what do you think?
Actually the hostname-override was unnecessary. When running kubeadm upgrade apply
, some default settings overwrite my adaptations, e.g. NodeRestriction
gets re-activated (also my scaling of Kube DNS instances gets reset, but this was of course not a show stopper here). Patching the NodeRestriction
admission rule out of /etc/kubernetes/manifests/kube-apiserver.yaml did the trick.
I have now written a chapter on upgrading HA clusters to my HA setup guide.
Also I have added code for automating this process to my ansible project on github. Take a look into the README.md file there for more information.
@mbert for the upgrade process you've outlined, what are the exact reasons for manually copying the configs and manifests from /etc/kubernetes
on the primary master to the secondary masters rather than simply running kubeadm upgrade apply <version>
on the secondary masters as well?
@mattkelly It seemed rather dangerous to me. Since the HA cluster's masters use an active/passive setup, but kubeadm knows about only one master I found running it again on a different master risky. I may be wrong though.
Replying to myself: Having looked at Jamie's guide on kubernetes.io, running kubeadm on the masters may work, even when setting up the cluster. I'll try this out next week and probably make some changes to my documents accordingly.
FWIW, running kubeadm
on the secondary masters seems to have worked just fine for me (including upgrade) - but I need to better understand the exact risks at each stage. I've been following @jamiehannaford's guide which is automated by @petergardfjall's hakube-installer (no upgrade support yet though, so I tested that manually).
Edit: Also important to note is that I'm only testing on v1.9+. Upgrade was from v1.9.0 to v1.9.2.
I have now followed the guide on kubernetes.io that @jamiehannaford created, i.e. ran kubeadm init
on the all master machines (after having copied /etc/kubernetes/pki/ca.* to the secondary masters). This works just fine for setting up the cluster. In order to be able to upgrade to v1.9.2 I am setting up v1.8.3 here.
Now I am running into trouble when trying to upgrade the cluster: Running kubeadm upgrade apply v1.9.2
on the first master fails:
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests872757515/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests872757515/kube-scheduler.yaml"
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests647361774/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
This step fails reproducably (I always start from the scratch, i.e. remove all configuration files plus etcd data from all nodes before starting a new setup).
I tried out several variations, but no success:
I have attached some logs. However I cannot really find any common pattern that would explain this problem to me. Maybe it is something I just don't know?
upgrade-failed-proxy-on-vip.log upgrade-failed-proxy-and-kubelet-on-vip.log upgrade-failed-proxy-and-kubelet-on-local-ip.log
Having tried out another few things it boils down to the following:
kubeadm init
was run last when setting up the cluster) works.configmap/kubeadm-config
and change the value for MasterConfiguration.nodeName
in there to the respective master's host name or simply delete that line. Others like @mattkelly have been able to perform the upgrade without editing configmap/kubeadm-config
, hence the way I set things up must be somehow different.
Anybody got a clue what I should change, so that upgrading works without this (rather dirty) trick?
I have tried upgrading from both 1.8.3 and 1.9.0 to 1.9.2, with the same result.
@mbert I'm now reproducing your issue from a fresh v1.9.0 cluster created using hakube-installer. Trying to upgrade to v1.9.3. I can't think of anything that has changed with my workflow. I'll try to figure it out today.
I verified that deleting the nodeName
line from configmap/kubeadm-config
for each subsequent fixes the issue.
Thank you, that's very helpful. I have now added patching configmap/kubeadm-config
to my instructions.
@mbert oops, I figured out the difference :). For previous upgrades I had been providing the config generated during setup via --config
(muscle memory I guess). This is why I never needed the workaround. I believe that your workaround is more correct in case the cluster has changed since init time. It would be great to figure out how to avoid that hack, but it's not too bad in the meantime - especially compared to all of the other workarounds.
Hello, Will kubeadm 1.10 remove any of the pre-steps/workarounds currently required for HA in 1.9 ? E.g. the manual creation of a bootstrap etcd, generation of etcd keys, etc?
Closing this item as 1.10 doc is out and we will be moving to further the HA story in 1.11
/cc @fabriziopandini
The planned HA features in kubeadm are not going to make it into v1.9 (see #261). So what can be done to make a cluster setup by kubeadm sufficiently HA?
This is what it looks like now:
Hence an active/active or active/passive master setup needs to be created (i.e. mimic what kubeadm would supposedly be doing in the futue):
This seems achievable if converting the existing master instance to a cluster of masters (2) can be done (the Kubernetes guide for building HA clusters seems to indicate so). Active/active would be not more expensive than active/passive.
I am currently working on this. If I succeed I shall share what I find out here.