High Availability Considerations docs outdated

nijave commented 3 months ago

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.3", GitCommit:"6813625b7cd706db5bc7388921be03071e1a492d", GitTreeState:"clean", BuildDate:"2024-03-15T00:06:16Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Cloud provider or hardware configuration: Ubuntu VMs

OS (e.g. from /etc/os-release):

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Kernel (e.g. uname -a):

Linux vmubtkube-a01 5.15.0-1054-kvm #59-Ubuntu SMP Thu Mar 14 16:03:41 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Container runtime (CRI) (e.g. containerd, cri-o): cri-o
Container networking plugin (CNI) (e.g. Calico, Cilium): Calico
Others:

What happened?

haproxy was still including nodes which returned non 200 health checks. I attempted to troubleshoot but there's been significant changes since haproxy v2.1 so documentation isn't readily available. It seems most likely the health check was running over HTTP (plaintext) and ignoring the returned 400 error code. In addition, haproxy v2.1 is no longer officially supported.

I also observed the very low timeouts in haproxy lead to frequent termination of kubectl ... -w and kubectl logs -f

Edit: I think it may also be possible that ssl-hello-chk option in the guide is overriding httpchk which would also explain the behavior I was seeing. https://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.2-check

https://endoflife.date/haproxy

What you expected to happen?

The guide to provide a setup using software still supported (patched) by the vendor

How to reproduce it (as minimally and precisely as possible)?

Attempt to use config in haproxy v2.8 docs (currently the default LTS) and it fails due to syntax changes

Anything else we need to know?

For keepalived check, I ended up with

#!/bin/sh

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl -sfk --max-time 2 https://localhost:5443/healthz -o /dev/null   || errorExit "Error GET https://localhost:5443/healthz

It's unclear in the guide why haproxy is only being health checked if it's running on the VIP. The guide configuration could allow keepalived to move the VIP to a node with working API server and broken haproxy which will fail and it will be moved again. Additionally, it doesn't check the health endpoint so the VIP could be moved to a node that's misconfigured but still responds to requests.

For haproxy 2.8, I ended up with the following (static pod)

global
    log stdout format raw local0
    daemon

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          1800s
    timeout server          1800s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
    bind *:5443
    mode tcp
    option tcplog
    default_backend apiserverbackend

#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserverbackend
    option httpchk

    http-check connect ssl
    http-check send meth GET uri /healthz
    http-check expect status 200

    mode tcp
    balance roundrobin
    server vmubtkube-a01 172.16.1.65:6443 check verify none
    server vmubtkube-a02 172.16.1.66:6443 check verify none
    server vmubtkube-a03 172.16.1.67:6443 check verify none

I don't know much about haproxy, but I'm not sure how the health check worked before unless haproxy v2.1 didn't validate certificates by default.

I reached these conclusions with the following tests:

kill etcd on a control plane node rendering api-server inoperable (it responds 500 to /healthz)
kill haproxy on a node while the control plane is running (other haproxy instances should still route to the node since api-server is operable)
reboot the node with the VIP, assuming it takes a non trivial amount of time (30 seconds or more), VIP should move to a new node and haproxy running on existing nodes should remove the rebooting node api-server. On completion of reboot, haproxy instances should be available on all nodes with all api-server backends active

nijave commented 3 months ago

I can create a PR but a sanity check on the above conclusions first would be appreciated

neolit123 commented 3 months ago

@nijave FTR, we haven't gotten complaints from other users about the guide. and yes, it has users.

@mbert do the above suggestions seem good to you?

/kind documentation /area ha

sftim commented 3 months ago

Is this also relevant to SIG Docs? I can't tell.

sftim commented 3 months ago

If kubeadm is OK and a page inside https://k8s.io/docs/ is out of date: transfer this to k/website and label it for SIG Cluster Lifecycle.

nijave commented 3 months ago

If kubeadm is OK and a page inside https://k8s.io/docs/ is out of date: transfer this to k/website and label it for SIG Cluster Lifecycle.

This one which I think official docs link to (but not seeing the link at the moment) https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md

neolit123 commented 3 months ago

yes, https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md ended up in this repo.

the link to it is here:

Read the Options for Software Load Balancing guide for more details.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#first-steps-for-both-methods

sftim commented 3 months ago

If we can move that doc into the main website, I think that benefits our users. Not required though.

neolit123 commented 3 months ago

moving it might break existing URL references.

also frankly, i would prefer if kubeadm docs eventually start moving in the other direction - i.e. to this repo. if this repo starts hosting kubeadm source code and it has versioned branches.

the kubeadm docs can be hosted similarly to other projects with netlify / hugo:

but that's a discussion for another time. and it's a lot of work.

mbert commented 3 months ago

@nijave FTR, we haven't gotten complaints from other users about the guide. and yes, it has users.

@mbert do the above suggestions seem good to you?

/kind documentation /area ha

From first glance (have been unable so far to thoroughly review things in my development environment): it is true that HAProxy has undergone some changes leading to different configuration files. If this turns out to be an issue here, then providing two sets of example configuration would make sense.

As the rest of the report is concerned I will need to take a closer look. Improvements are always welcome. I hope to be able to do this on the weekend.

neolit123 commented 3 months ago

thanks @mbert if we can include only the latest config, that seems better to me, so that we can avoid being in the business of tracking N versions.

mbert commented 3 months ago

thanks @mbert if we can include only the latest config, that seems better to me, so that we can avoid being in the business of tracking N versions.

Let's see. For now still both versions are used in the field, because the older one is still present in EL distros.

neolit123 commented 3 months ago

thanks @mbert if we can include only the latest config, that seems better to me, so that we can avoid being in the business of tracking N versions.

Let's see. For now still both versions are used in the field, because the older one is still present in EL distros.

that's a good point. users sometimes just install from the distro packaging.

mbert commented 2 months ago

I have now had some time to read through everything. First of all: I totally agree with @nijave - the guide is outdated here, and I think a PR with the proposed changes would be very welcome.

Actually the examples in the guide were, IIRC, created using HAProxy 1.8 on an EL7 platform, and given the changes in HAProxy since then the configuration example should really get updated. Since what I still have (I am not actively using the setup in my environment, hence experimenting would require me setting things up again before) is all based on that "ancient" HAProxy I cannot quickly provide a configuration for version 2.1, because I have never had one in use. Long story short: I propose to provide the configuration example for 2.8 as seen above (assuming that it has been tested and works) along with mentioning the version and the fact that this may not work for other versions.

Regarding the health check: again, I totally agree that the HAProxy should be checked on all nodes, not only the one with the VIP. Thank you for noticing!

neolit123 commented 2 months ago

@nijave would you send a PR for this as you suggested earlier?

larsskj commented 2 months ago

I'm running HA Proxy 2.4 (default in Ubuntu 22.04) without any custom healthcheck setups - and it works just fine:

frontend kube-cph
        bind :::6443 v4v6
        mode tcp
        timeout client 600s
        option tcplog
        default_backend kube-cph-api

backend kube-cph-api
        mode tcp
        timeout server 600s
        option log-health-checks
        server kube01 kube01:6443 check
        server kube02 kube02:6443 check
        server kube03 kube03:6443 check

I've been running an HA Proxy loadbalancer in front of my K8s clusters with similar configurations for at least four years, on several clusters, never had any problems.

nijave commented 2 months ago

@nijave would you send a PR for this as you suggested earlier?

Yeah, give me a few days

nijave commented 2 months ago

I'm running HA Proxy 2.4 (default in Ubuntu 22.04) without any custom healthcheck setups - and it works just fine:
frontend kube-cph
        bind :::6443 v4v6
        mode tcp
        timeout client 600s
        option tcplog
        default_backend kube-cph-api

backend kube-cph-api
        mode tcp
        timeout server 600s
        option log-health-checks
        server kube01 kube01:6443 check
        server kube02 kube02:6443 check
        server kube03 kube03:6443 check
I've been running an HA Proxy loadbalancer in front of my K8s clusters with similar configurations for at least four years, on several clusters, never had any problems.

I was able to get a working setup when using the current guide, however it didn't handle failures scenarios correctly (so it was load balanced, but not really highly available). With your setup, it doesn't look like you're checking api-server health, only if it accepts TCP connections. If api-server health check is failing, haproxy will still route traffic to those instances. An easy test is killing etcd on a node and observing api-server is still running but returning an error code for /healthz (in which case it should be removed from active backends in haproxy)

larsskj commented 2 months ago

You're probably right. At home I have a bare metal cluster, and I routinely update the nodes, meaning that Kubernetes will be shutdown during reboots.

So far HA Proxy have handled this situation without any hickups - but let me have a look when I return home.

larsskj commented 2 months ago

Did some further testing: This works for me.

backend kube-cph-api
        mode tcp
        timeout server 600s

        option log-health-checks
        option httpchk GET /healthz

        server kube01 kube01:6443 check check-ssl verify none
        server kube02 kube02:6443 check check-ssl verify none
        server kube03 kube03:6443 check check-ssl verify none

nijave commented 2 months ago

Poked around and it looks like Ubuntu and recent version of RHEL (clones) are on HAProxy v2.4 (LTS) which appears to also work with the config I mentioned above

OS - version (EOL date for community support)

RHEL 7 - v1.5.18 (30 Jun 2024) RHEL 8 - v1.8.27 (31 May 2029) RHEL 9 - v2.4.22 (31 May 2032)

Ubuntu 20.04 - v2.0.33 (02 Apr 2025) Ubuntu 22.04 - v2.4.24 (01 Apr 2027) Ubuntu 24.04 - v2.8.5 (25 Apr 2036)

kubernetes / kubeadm