Closed fabriziopandini closed 6 years ago
C seems like the simplest solution, but I'd love to hear more about A. I think we've really got a couple of use cases, stacked control plane nodes scale out to some number n nodes before etcd needs to have dedicated hosts, then it would be great if we had a path to get switch to external/dedicated hosts.
I'd rule out B and D for now unless there is a compelling reason to add that complexity.
@chuckha
I'd love to hear more about A (Stacked etcd should be a “trasparent” evolution of current local etcd mode)
From what I understand stacked etcd is an etcd instance like local etcd, with the difference that it listen on a public IP instead of 127.0.0.1 and it has a bunch of additional flags/certificate sans . Why not changing local etcd static pod manifest to be equal to stacked etcd manifest?
Does this sound reasonable to you?
it would be great if we had a path to get switch to external/dedicated hosts
Great suggestions, let's keep this in mind as well
A) Stacked etcd should be a “trasparent” evolution of current local etcd mode or
If I'm understanding this option, it would basically just extend the existing local etcd mode to support the additional flags, SANs, etc that the stacked deployment currently uses and is mainly about providing an upgrade path for existing local etcd-based deployments rather than providing HA support itself. Is that correct?
That said, it would require config changes to make work, since we would need to expand the per-node configuration to include etcd config/overrides for things such as which IP, hostname, or SANs to use (if the defaults are not sufficient).
B) users will be requested to explicitly opt-in on stacked etcd e.g. by using a dedicated config type?
I don't like this option as it requires users to make a decision for HA/non-HA support before starting.
C) The number of Stacked etcd should be “tied” to the number of controlplane instances or
+1 for this, if there is a need to have a different number of etcd hosts vs controlplane instances, then external etcd should be used instead.
D) we would like to scale etcd be separated from control plane scaling (e.g kubeadm join --etcd)
While I could see some value in this, the ability to use it would be limited since we don't provide a way to init a single etcd instance. I would expect that workflow to look like the following:
Where the entire etcd cluster is bootstrapped prior to bootstrapping control plane instances. Currently that workflow would require that kubeadm now have access to the client certificate to manipulate etcd, which is not currently the case. I'm not exactly sure how we are currently handling this for extending the control plane.
The nice thing about this approach is that it would simplify the external etcd story as well, but I think it should be in addition to C
rather than in place of C
if we support that workflow. I think we'd also probably want to break them out into separate high level commands, since we wouldn't necessarily be fully configuring the kubelets to join the overall cluster in that use case.
@detiber happy to see we are on that same page here!
it would basically just extend the existing local etcd mode ...rather than providing HA support itself
Yes, but with the addition than when before adding etcd members we are going to call etcdctl member add
on one of the existing members.
This will increase HA of the cluster, with the caveat that each API server use only the etcd endpoint of its own local etcd (instead of the list of etcd endpoints). So if an etcd member fails, all the control plane components on the same node will fail and everything will be switched to another control plane node.
NB. This can improved up to a certain extent by passing to the API server the list of etcd endpoints known at the moment of join
it would require config changes to make work
Yes but I consider this changes less invasive than creating a whole new etcd type. On top of that I think that we can use advertise address and hostname as a reasonable defaults, so the user will be required to set additional config options only in few cases
I think we'd also probably want to break them out into separate high level commands I think it should be in addition to C rather than in place of C
+1 if we want to have a sound story around etcd alone this should be addressed properly. For the time being I will be more than happy to improve the story about control plane and/tied to etcd, that is part of kubeadm since it's inception
@fabriziopandini For the issue with the control plane being fully dependent on the local etcd, there is an issue to track the lack of etcd auto sync support within Kubernetes itself: https://github.com/kubernetes/kubernetes/issues/64742
/lifecycle active
@detiber @chuckha @timothysc I have a working prototype of the approach discussed above 😃
kubeadm init
> creates a local etcd instance similar to the one described here. The main difference vs now is that it uses another IP address instead of 127.0.0.1
- etcd
- --advertise-client-urls=https://10.10.10.11:2379
- --initial-advertise-peer-urls=https://10.10.10.11:2380
- --initial-cluster=master1=https://10.10.10.11:2380
- --listen-client-urls=https://127.0.0.1:2379, https://10.10.10.11:2379
- --listen-peer-urls=https://10.10.10.11:2380
....
kubeadm join
--control-plane > adds a second etcd instance similar to the one described here. In case of joining etcd instances, the etcd manifest is slightly different and contains all the existing etcd members + the joining one, and also the --initial-cluster-state flag is set to existing
- etcd
- --initial-cluster=master1=https://10.10.10.11:2380,master2=https://10.10.10.12:2380
- --initial-cluster-state=existing
....
So far so good.
Now the tricky question. kubeadm upgrade
....
When kubeadm executes upgrades it will recreate the etcd manifest. Are there any settings I should take care of because I'm upgrading an etcd cluster instead of an etcd single instance?
More specifically, are there any recommended values for --initial-cluster
and --initial-cluster-state
or simply I don't care because my etcd cluster already exists and I'm basically changing only the etcd binary?
@detiber @chuckha @timothysc from coreos doc
--initial prefix flags are used in bootstrapping (static bootstrap, discovery-service bootstrap or runtime reconfiguration) a new member, and ignored when restarting an existing member.
so it doesn't matter which values I assign to --initial-cluster
and --initial-cluster-state
.
Considering this my idea is to keep upgrade workflow "simple" and generate the new etcd manifest without compiling the --initial-cluster
with the full list of etcd members.
Opinions?
Last bit: What IP address should we use for etcd? if we are going to use the API server advertise address for etcd as well this will simplify things a lot...
/close
@fabriziopandini: Closing this issue.
Stacked etcd is as a manual procedure described in https://kubernetes.io/docs/setup/independent/high-availability/.
However, kubeadm could automate the stacked etcd procedure as new step of the
kubeadm join --control-plane workflow
.Some design decision should be taken before implementing.
Considering the goal of keeping kubeadm simple and maintainable, IMO preferred options are A) and C)… wdyt?
cc @detiber @chuckha @timothysc