kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.6k stars 14.49k forks source link

Concern about node count for minimal HA control plane with external etcd #42691

Closed tjanson closed 1 year ago

tjanson commented 1 year ago

The section External etcd topology on the page Options for Highly Available Topology of the kubeadm cluster setup section states:

A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this [external etcd] topology.

I may be mistaken, but wouldn't the minimum number of control plane nodes in this case be two? (Though perhaps not advisable, at least technically.) That gives us a redundant pair of each CP component (apiserver, controller-manager and scheduler), as well as the HA three node etcd cluster.

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tjanson commented 1 year ago

I see now this is a duplicate of (stale, closed) #33033.

/language en /kind bug /sig architecture

neolit123 commented 1 year ago

/sig cluster-lifecycle

there was a blog post at k8s.io about HA written by Steve Wong, but i cannot find it.

HA is an opinionated area in computing. 2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy - i.e. "you have the backup of the backup, which may be redundant".

upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA:

At a minimum, an OpenShift cluster contains 2 worker nodes in addition to 3 control plane nodes.

https://access.redhat.com/solutions/5034771

personally, i would consider < 3 in k8s as non-HA, but users can make the choice.

/close

k8s-ci-robot commented 1 year ago

@neolit123: Closing this issue.

In response to [this](https://github.com/kubernetes/website/issues/42691#issuecomment-1689969886): >/sig cluster-lifecycle > >there was a blog post at k8s.io about HA written by Steve Wong, but i cannot find it. > >HA is an opinionated area in computing. 2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy - i.e. "you have the backup of the backup, which may be redundant". > >upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA: > >> At a minimum, an OpenShift cluster contains 2 worker nodes in addition to 3 control plane nodes. > >https://access.redhat.com/solutions/5034771 > >personally, i would consider < 3 in k8s as non-HA, but users can make the choice. > >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tjanson commented 1 year ago

Excuse me for being blunt, but I don't think you've given this issue the consideration it deserves and requires. The key point is the distinction between etcd cluster nodes and Kubernetes control plane nodes (and their effect on HA), which your comment does not address and which you do not seem to have considered.

HA is an opinionated area in computing.

We're specifically discussing the HA requirements of the Kubernetes control plane. That is not a matter of opinion, but fact.

2 is considered the minimum, where the 2nd server is the fallback/redundancy server. however the argument here is that 2 is not really redundancy. 2 provides the fallback, yet 3 is really what provides the redundancy

Again excuse my bluntness, but that's an oversimplified, imprecise portrayal of HA in the context of Kubernetes. It is not sufficient to consider just these broad terms in a discussion of etcd and control plane components.

upstream kubeadm is just one k8s distribution with its recommendations of 3 CP nodes. yet other distributions like openshift also run 3 as the minimum HA

Yes, they do so because of a stacked etcd topology. The docs section this issue refers to is about a different topology (external etcd). That exact distinction is the entire point of the issue.

personally, i would consider < 3 in k8s as non-HA, but users can make the choice.

Again, this isn't about your (or anyone else's) personal opinion or recommendation, it is about the technical minimum of K8s control plane components/nodes.

I request that you reopen the issue.

neolit123 commented 1 year ago

Excuse me for being blunt, but I don't think you've given this issue the consideration it deserves and requires. The key point is the distinction between etcd cluster nodes and Kubernetes control plane nodes (and their effect on HA), which your comment does not address and which you do not seem to have considered.

my comment is specifically about the external etcd topology. in short, the recommendation of the maintainers is to have 3 cp machines even if etcd is not run on them. if users do not agree with our ideas of HA they can run less or more cp machines.

tjanson commented 1 year ago

I request that you reopen this issue so that a second org member can give their opinion. E.g., @sftim, who was active in the other issue (I'm also fine with reopening the stale #33033 instead of this issue).

neolit123 commented 1 year ago

/reopen

k8s-ci-robot commented 1 year ago

@neolit123: Reopened this issue.

In response to [this](https://github.com/kubernetes/website/issues/42691#issuecomment-1690237185): >/reopen > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sftim commented 1 year ago

A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this [external etcd] topology.

The minimum number of control plane nodes for Kubernetes to work is one. However, the minimum recommended number of etcd is three, because:

So far, so uncontroversial. How about the API server, k-c-m, scheduler, etc?

For the external etcd topology, maybe you can get away with two further nodes, relying on the etcd cluster to support leader election etc. I'm a lead for Docs, not API machinery, so I can't comment authoritatively. However - it sounds plausible.

sftim commented 1 year ago

/retitle Concern about node count for minimal HA control plane with external etcd

sftim commented 1 year ago

@tjanson would you be happy to see #33033 reopened and this closed as a duplicate?

sftim commented 1 year ago

Ah, I see you would. /triage duplicate /close not-planned

k8s-ci-robot commented 1 year ago

@sftim: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/website/issues/42691#issuecomment-1690308888): >Ah, I see you would. >/triage duplicate >/close not-planned Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.