kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.37k stars 14.15k forks source link

Replica count wrong(?) in Options for Highly Available Topology page #33033

Open ejensen-mural opened 2 years ago

ejensen-mural commented 2 years ago

Options for Highly Available Topology states in reference to external etcd topology that "A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology."

However, why the control plane nodes without etcd stacked require at least three hosts instead of two is not entirely clear. While etcd relies on a quorum to elect a leader due to being stateful, kube-controller and kube-scheduler rely on an active-passive model using simple leader election through leasing due to being stateless.

Is this an error in the documentation or is there some reason why two control plane nodes is not sufficient even without etcd stacked? [see comment from 2023-08-23 for more details]

sftim commented 2 years ago

It's a good question. Maybe this advice predates the Lease API and the current mechanisms for leader election.

I'm going to mark this as a bug and let someone who knows the topic well triage it.

/language en /kind bug

sftim commented 2 years ago

/retitle Replica count wrong(?) in Options for Highly Available Topology page

sftim commented 2 years ago

I think the most appropriate SIG is /sig architecture

(but I might be wrong)

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ejensen-mural commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/website/issues/33033#issuecomment-1355604710): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sftim commented 11 months ago

/reopen /lifecycle frozen

k8s-ci-robot commented 11 months ago

@sftim: Reopened this issue.

In response to [this](https://github.com/kubernetes/website/issues/33033#issuecomment-1690309071): >/reopen >/lifecycle frozen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sftim commented 11 months ago

/sig api-machinery

sftim commented 11 months ago

/triage accepted

Only two control plane nodes are required

sftim commented 11 months ago

To fix this, change the paragraph:

However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is five (you don't need to put bold text in the documentation).

/help

k8s-ci-robot commented 11 months ago

@sftim: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/website/issues/33033): >To fix this, change the paragraph: >> However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology. > >to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is **five** (you don't need to put bold text in the documentation). > >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sftim commented 11 months ago

/priority backlog

neolit123 commented 11 months ago

to make it clear that you need an odd number of etcd nodes (three minimum) and two or more control plane nodes. For a high availability architecture, the minimum number of hosts in this kind of control plane is five (you don't need to put bold text in the documentation).

@sftim FWIW, the 3 cp nodes in the external etcd topology is intentional as i pointed out here: https://github.com/kubernetes/website/issues/42691#issuecomment-1690129408

we actually want users to run 3 apiservers for an HA control plane, and not 2.

logicalhan commented 11 months ago

we actually want users to run 3 apiservers for an HA control plane, and not 2.

I don't think this is a sufficient response. HA systems are generally characterized as not having SPOF (having redundancy), which would be true for a 2-node control plane with a separate 3-node etcd cluster.

sftim commented 11 months ago

We can clarify minimum vs recommended, especially if these are different. The word “minimum” has a very specific and widely understood meaning.

logicalhan commented 11 months ago

We can clarify minimum vs recommended, especially if these are different. The word “minimum” has a very specific and widely understood meaning.

Unless there is some data stating why we would preferentially want 3 control-plane nodes instead of two (when the etcd cluster is not co-located), how could we make a recommendation?

neolit123 commented 11 months ago

if one googles HA and the minimum of 3 va 2, the recommendations vary for differentl systems. it is true that the classic is 2. for k8s, this non-colocated recommendation of 3 cp machines came from discussions with cluster-lifecycle leads at the time (2017?)

also, as i mentioned on https://github.com/kubernetes/website/issues/42691#issuecomment-1689969886 there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

so, this proposal for a change here is debatable and i am -1 overall.

logicalhan commented 11 months ago

there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

Is this really your argument?

sftim commented 11 months ago

Unless there is some data stating why we would preferentially want 3 control-plane nodes instead of two (when the etcd cluster is not co-located), how could we make a recommendation?

Just as an example (please don't take this seriously):

-However, this topology requires twice the number of hosts as the stacked HA topology.
-A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.
+This topology needs more hosts than the stacked HA topology. The minimum number
+of hosts is five (three etcd hosts and two control plane hosts); however, the Kubernetes
+project recommends running at least four control plane hosts and three etcd hosts,
+because we like the number seven.

Anyway, given the :+1: reactions that I saw in Slack from SIG Architecture I'm confident that the true minimum is two (control plane hosts, provided that they have an etcd cluster to talk to). I don't have much of an opinion on the number of control plane hosts or fault isolation zones we should recommend that people operate with.

neolit123 commented 11 months ago

there was a blog post by Steve Wong who spoke at k8s.io how 3 is preferred. cc @cantbewong

Is this really your argument?

no comment.

i like Joe's point about redundancy on upgrade with 3 cp nodes: https://kubernetes.slack.com/archives/C5P3FE08M/p1692839159431539?thread_ts=1692809843.558739&cid=C5P3FE08M

and as we can see dufferent k8s maintainers from the same company may have a different opinion.

i'd say, leave this recommended minimum to 3. perhaps a clarifying note is required on the page, though. ALA "we are aware HA is situational and a subject of interpretation"

sftim commented 10 months ago

@neolit123 how can we resolve the differences of opinion here?

Most people are saying 5 is the actual minimum to achieve n+1 resilience; you're - I think - saying that you'd prefer not to disclose that detail and instead recommend: 3n etcd hosts and m control plane hosts, where m ≥ 3.

We could reword the page to not mention any minimum (although people will then file requests that we document it).

However, SIG Docs can't arbitrate here. We need the SIGs involved to reach agreement on the technical side.

neolit123 commented 10 months ago

i object to reducing the recommended api server count to 2 for the external etcd topology.

related to the 5 vs 3 etcd, that is true and yes the problem can be seen under some conditions. in both cases the admin or controllers needs to act, though. for 3 - potential downtime.

how can we resolve the differences of opinion here?

i cannot resolve that, but 2 notes can be added on top of the doc:

sftim commented 10 months ago

i object to reducing the recommended api server count to 2 for the external etcd topology

Just to clarify: that's not what's being proposed. What's proposed is to document the minimum for a cluster to provide resilience (we don't document against what).

There's room to do a lot more here if there are volunteers with capacity to do it.

sftim commented 10 months ago

I'll leave this open; right now we (SIG Docs) aren't prioritising the liaison around achieving a consensus position.

neolit123 commented 10 months ago

Just to clarify: that's not what's being proposed. What's proposed is to document the minimum for a cluster to provide resilience (we don't document against what).

some posts above, 2 was definitely being proposed as the minimum and it is not resilient.

There's room to do a lot more here if there are volunteers with capacity to do it.

i can PR the document as it's maintained by kubeadm maintainers (path is /docs/setup/production-environment/tools/kubeadm/ha-topology/). but i can do that with only the notes mentioned here: https://github.com/kubernetes/website/issues/33033#issuecomment-1705302298

i haven't seen better proposals for content change, yet.

sftim commented 8 months ago

I'd like to document the minimum number of control plane nodes somewhere; ideally not just in the kubeadm docs, because there are other valid ways to deploy a cluster. It's obvious that for non-HA the number is 1, but there's tension about the number for HA.

Can you confirm that there is a plausible and relevant failure mode for a 2-node control plane (backed by a separate 3 node etcd cluster) @neolit123? I'm not convinced that there is, based on my understanding on how much the API server is able to rely on etcd for resilient data persistence and resolution of conflicts.

This is a different question from “does the Kubernetes project recommend three nodes?”. We might well recommend more than the strict minimum for any number of reasons, and to do so is fine. We might even say that for clusters deployed with kubeadm the supported minimum control plane node count is three, stacked etcd or otherwise. Those opinions are valid for the project to hold and publish.

I'm revisiting this issue because people with unusual topologies (eg: scheduler not colocated with API server; API server colocated with etcd but kube-controller-manager and scheduler are separate) are a constituency we don't serve well with our reference docs, and this issue seems to be about serving that specific audience better.

neolit123 commented 8 months ago

for kubeadm perhaps we should keep the ha pages the way they are. for k8s in general there is a lot to talk about failure modes, different topologies and component instances.

on the topic of 2 vs 3 api servers i think my comment is the same as https://github.com/kubernetes/website/issues/33033#issuecomment-1692068401

sftim commented 8 months ago

I'll follow this up with a wider request for primary evidence to support a minimum node count. (I have seen the “it is not resilient” comment but that is only secondary evidence).