k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.57k stars 2.31k forks source link

Master used to create a cluster of masters with embedded etcd is always Ready despite being powered off #4504

Closed indywidualny closed 2 years ago

indywidualny commented 2 years ago

Environmental Info: K3s Version:

k3s version v1.21.5+k3s2 (724ef700)
go version go1.16.8

Node(s) CPU architecture, OS, and Version:

Linux master-2 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

NAME       STATUS   ROLES                       AGE   VERSION
master-1   Ready    control-plane,etcd,master   51m   v1.21.5+k3s2
master-2   Ready    control-plane,etcd,master   48m   v1.21.5+k3s2
master-3   Ready    control-plane,etcd,master   47m   v1.21.5+k3s2
node-1     Ready    <none>                      46m   v1.21.5+k3s2
node-2     Ready    <none>                      46m   v1.21.5+k3s2

Describe the bug: I'm experimenting with k3s and high availability so created a cluster basing on: https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/ and noticed that bringing down master used to create the cluster will break the cluster despite its high availability. Pods will stop to be managed, status of this master (only this one is problematic) will be also Ready in spite of being powered off.

Steps To Reproduce:

# Install K3S on master
curl -sfL https://get.k3s.io | sh -s - server \
    --clu
    --disable-cloud-controller \
    --write-kubeconfig-mode=644 \
    --node-name="$(hostname -f)" \
    --cluster-cidr="10.244.0.0/16" \
    --kube-controller-manager-arg="address=0.0.0.0" \
    --kube-controller-manager-arg="bind-address=0.0.0.0" \
    --kube-proxy-arg="metrics-bind-address=0.0.0.0" \
    --kube-scheduler-arg="address=0.0.0.0" \
    --kube-scheduler-arg="bind-address=0.0.0.0" \
    --kubelet-arg="cloud-provider=external" \
    --token="SECRET" \
    --flannel-iface=ens10 \
    --tls-san="10.0.0.10" \
    --tls-san="$(hostname -I | awk '{print $2}')"

# `tls-san`: internal IP of the machine, internal IP of a Load Balancer for the API on port 6443

# Hetzner Cloud related
kubectl -n kube-system create secret generic hcloud --from-literal=token=SECRET --from-literal=network=network-kubernetes
kubectl apply -f https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/latest/download/ccm-networks.yaml

Then I've joined 2 additional masters.

curl -sfL https://get.k3s.io | K3S_URL=https://10.0.0.10:6443 K3S_TOKEN=SECRET sh -s - server \
    --node-name="$(hostname -f)" \
    --kubelet-arg="cloud-provider=external" \
    --flannel-iface=ens10 \
    --disable-cloud-controller

# Uses IP of a LB for Kubernetes API

Later joined 2 nodes.

The problem is cluster can easily survive failure of master-2 or master-3 and still spawn pods etc. however when bringing down master-1 used to initialize the cluster things start to stop working. Pods are no longer managed and status of a dead master-1 stays in status Ready forever.

kubectl get cs occasionally show errors about components being unhealthy as well which isn't the case when master-2 or master-3 is down.

Expected behavior:

master-1 can be down and everything is still up and running since the quorum is maintained by master-2 and master-3

Actual behavior:

Kubernetes cluster is no longer managed, node master-1 is always in Ready state.

Additional context / logs: Will add when I know which logs can help.

Backporting

brandond commented 2 years ago

I suspect there's something unique to your environment or configuration; all the core Kubernetes controllers are lease-locked and should migrate over to a new node within a minute or so of a server node being stopped. Can you attach the logs from all three servers, as well as the output of kubectl get lease -A when one of the servers is stopped?

indywidualny commented 2 years ago

Well. The issue is solved. It was configuration specific indeed.

All 3 masters running:

➜  kubernetes-test kubectl get lease -A
NAMESPACE         NAME                                             HOLDER                                          AGE
kube-node-lease   master-1                                         master-1                                        69m
kube-node-lease   master-2                                         master-2                                        68m
kube-node-lease   master-3                                         master-3                                        67m
kube-node-lease   node-1                                           node-1                                          35m
kube-node-lease   node-2                                           node-2                                          34m
kube-system       kube-controller-manager                          master-1_af86ce8f-1741-431b-a149-520d665b3553   69m
kube-system       kube-scheduler                                   master-1_386799f4-ff44-46f2-8758-3af88284eff7   69m
longhorn-system   driver-longhorn-io                               csi-provisioner-669c8cc698-pbwjg                104s
longhorn-system   external-attacher-leader-driver-longhorn-io      csi-attacher-75588bff58-tj4ls                   104s
longhorn-system   external-resizer-driver-longhorn-io              csi-resizer-5c88bfd4cf-q4lpk                    104s
longhorn-system   external-snapshotter-leader-driver-longhorn-io   csi-snapshotter-69f8bc8dcf-qk9ss                103s
longhorn-system   longhorn-manager-upgrade-lock                                                                    2m35s

master-3 down, the rest is up:

➜  kubernetes-test kubectl get nodes
NAME       STATUS     ROLES                       AGE   VERSION
master-1   Ready      control-plane,etcd,master   86m   v1.21.6+k3s1
master-2   Ready      control-plane,etcd,master   84m   v1.21.6+k3s1
master-3   NotReady   control-plane,etcd,master   84m   v1.21.6+k3s1
node-1     Ready      <none>                      52m   v1.21.6+k3s1
node-2     Ready      <none>                      51m   v1.21.6+k3s1
➜  kubernetes-test kubectl get lease -A
NAMESPACE         NAME                                             HOLDER                                          AGE
kube-node-lease   master-1                                         master-1                                        86m
kube-node-lease   master-2                                         master-2                                        85m
kube-node-lease   master-3                                         master-3                                        84m
kube-node-lease   node-1                                           node-1                                          52m
kube-node-lease   node-2                                           node-2                                          51m
kube-system       kube-controller-manager                          master-1_af86ce8f-1741-431b-a149-520d665b3553   86m
kube-system       kube-scheduler                                   master-1_386799f4-ff44-46f2-8758-3af88284eff7   86m
longhorn-system   driver-longhorn-io                               csi-provisioner-669c8cc698-pbwjg                18m
longhorn-system   external-attacher-leader-driver-longhorn-io      csi-attacher-75588bff58-tj4ls                   18m
longhorn-system   external-resizer-driver-longhorn-io              csi-resizer-5c88bfd4cf-q4lpk                    18m
longhorn-system   external-snapshotter-leader-driver-longhorn-io   csi-snapshotter-69f8bc8dcf-4m6n6                18m
longhorn-system   longhorn-manager-upgrade-lock                                                                    19m

master-2 down, the rest is up:

➜  kubernetes-test kubectl get nodes
NAME       STATUS     ROLES                       AGE   VERSION
master-1   Ready      control-plane,etcd,master   95m   v1.21.6+k3s1
master-2   NotReady   control-plane,etcd,master   94m   v1.21.6+k3s1
master-3   Ready      control-plane,etcd,master   93m   v1.21.6+k3s1
node-1     Ready      <none>                      61m   v1.21.6+k3s1
node-2     Ready      <none>                      60m   v1.21.6+k3s1
➜  kubernetes-test kubectl get lease -A
NAMESPACE         NAME                                             HOLDER                                          AGE
kube-node-lease   master-1                                         master-1                                        95m
kube-node-lease   master-2                                         master-2                                        94m
kube-node-lease   master-3                                         master-3                                        93m
kube-node-lease   node-1                                           node-1                                          61m
kube-node-lease   node-2                                           node-2                                          60m
kube-system       kube-controller-manager                          master-1_af86ce8f-1741-431b-a149-520d665b3553   95m
kube-system       kube-scheduler                                   master-1_386799f4-ff44-46f2-8758-3af88284eff7   95m
longhorn-system   driver-longhorn-io                               csi-provisioner-669c8cc698-pbwjg                27m
longhorn-system   external-attacher-leader-driver-longhorn-io      csi-attacher-75588bff58-tj4ls                   27m
longhorn-system   external-resizer-driver-longhorn-io              csi-resizer-5c88bfd4cf-q4lpk                    27m
longhorn-system   external-snapshotter-leader-driver-longhorn-io   csi-snapshotter-69f8bc8dcf-4m6n6                27m
longhorn-system   longhorn-manager-upgrade-lock

master-1 down (it was problematic before), the rest is up:

➜  kubernetes-test kubectl get nodes
NAME       STATUS     ROLES                       AGE    VERSION
master-1   NotReady   control-plane,etcd,master   101m   v1.21.6+k3s1
master-2   Ready      control-plane,etcd,master   100m   v1.21.6+k3s1
master-3   Ready      control-plane,etcd,master   100m   v1.21.6+k3s1
node-1     Ready      <none>                      67m    v1.21.6+k3s1
node-2     Ready      <none>                      67m    v1.21.6+k3s1
➜  kubernetes-test kubectl get lease -A
NAMESPACE         NAME                                             HOLDER                                          AGE
kube-node-lease   master-1                                         master-1                                        102m
kube-node-lease   master-2                                         master-2                                        100m
kube-node-lease   master-3                                         master-3                                        100m
kube-node-lease   node-1                                           node-1                                          67m
kube-node-lease   node-2                                           node-2                                          67m
kube-system       kube-controller-manager                          master-3_e427a3c2-c5b1-486c-89c2-66e00623b104   101m
kube-system       kube-scheduler                                   master-3_972c5363-3afd-4ea6-8ec0-ca99694f499c   101m
longhorn-system   driver-longhorn-io                               csi-provisioner-669c8cc698-pbwjg                33m
longhorn-system   external-attacher-leader-driver-longhorn-io      csi-attacher-75588bff58-tj4ls                   33m
longhorn-system   external-resizer-driver-longhorn-io              csi-resizer-5c88bfd4cf-q4lpk                    33m
longhorn-system   external-snapshotter-leader-driver-longhorn-io   csi-snapshotter-69f8bc8dcf-pgzrb                33m
longhorn-system   longhorn-manager-upgrade-lock                                                                    34m

It means you were right and all works like expected, the issue was configuration specific. For everyone with the problem I'm posting how to correctly setup k3s on Hetzner Cloud. Previous setup was incorrect.

Setup

Private network

hcloud context create kubernetes-test
hcloud network create --name network-kubernetes --ip-range 10.0.0.0/16
hcloud network add-subnet network-kubernetes --network-zone eu-central --type server --ip-range 10.0.0.0/16
# Attach all your machines to this network now

Firewall

cat <<-EOF > /tmp/firewall
    [
      {
        "direction": "in",
        "protocol": "tcp",
        "port": "22",
        "source_ips": [
          "0.0.0.0/0",
          "::/0"
        ],
        "destination_ips": []
      },
      {
        "direction": "in",
        "protocol": "icmp",
        "port": null,
        "source_ips": [
          "0.0.0.0/0",
          "::/0"
        ],
        "destination_ips": []
      },
      {
        "direction": "in",
        "protocol": "tcp",
        "port": "6443",
        "source_ips": [
          "0.0.0.0/0",
          "::/0"
        ],
        "destination_ips": []
      },
      {
        "direction": "in",
        "protocol": "tcp",
        "port": "any",
        "source_ips": [
          "10.0.0.0/16"
        ],
        "destination_ips": []
      },
      {
        "direction": "in",
        "protocol": "udp",
        "port": "any",
        "source_ips": [
          "10.0.0.0/16"
        ],
        "destination_ips": []
      }
    ]
EOF

hcloud firewall create --name firewall-kubernetes --rules-file /tmp/firewall
# Attach firewall to all your machines by some label selector now

Create a Load Balancer for external Kubernetes API access

hcloud load-balancer create --type lb11 --location nbg1 --name lb-kubernetes-api
hcloud load-balancer attach-to-network --network network-kubernetes --ip 10.0.0.10 lb-kubernetes-api
hcloud load-balancer add-target lb-kubernetes-api --label-selector role=master --use-private-ip
hcloud load-balancer add-service lb-kubernetes-api --protocol tcp --listen-port 6443 --destination-port 6443
# Add label role=master to all VPS hosting your masters

First Master (cluster init)

apt update && apt upgrade -y && apt install apparmor apparmor-utils -y

# Configure variables first
export K3S_TOKEN="[secret]"
export K3S_VERSION="v1.21.6+k3s1"
export LB_EXTERNAL_IP="1.2.3.4"     # adjust
export LB_INTERNAL_IP="10.0.0.10"   # adjust

# Install K3S on master
    curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=$K3S_VERSION K3S_TOKEN=$K3S_TOKEN sh -s - server \
    --cluster-init \
    --disable-cloud-controller \
    --disable metrics-server \
    --write-kubeconfig-mode=644 \
    --node-ip=$(hostname -I | awk '{print $2}') \
    --node-external-ip=$(hostname -I | awk '{print $1}') \
    --node-name="$(hostname -f)" \
    --cluster-cidr="10.244.0.0/16" \
    --etcd-expose-metrics=true \
    --kube-controller-manager-arg="address=0.0.0.0" \
    --kube-controller-manager-arg="bind-address=0.0.0.0" \
    --kube-proxy-arg="metrics-bind-address=0.0.0.0" \
    --kube-scheduler-arg="address=0.0.0.0" \
    --kube-scheduler-arg="bind-address=0.0.0.0" \
    --kubelet-arg="cloud-provider=external" \
    --node-taint CriticalAddonsOnly=true:NoExecute \
    --flannel-iface=ens10 \
    --tls-san="$(hostname -I | awk '{print $1}')" \
    --tls-san="$(hostname -I | awk '{print $2}')" \
    --tls-san="$LB_EXTERNAL_IP" --tls-san="$LB_INTERNAL_IP"

kubectl -n kube-system create secret generic hcloud --from-literal=token=[secret] --from-literal=network=network-kubernetes
kubectl apply -f https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/latest/download/ccm-networks.yaml

I'm using a Hetzner Load Balancer to access all of them from outside later no matter which ones are alive.

Additional masters (odd number of masters in total)

apt update && apt upgrade -y && apt install apparmor apparmor-utils -y

export K3S_TOKEN="[secret]"
export K3S_VERSION="v1.21.6+k3s1"
export FIRST_MASTER_PRIVATE_IP="10.0.0.2"  # adjust

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=$K3S_VERSION K3S_TOKEN=$K3S_TOKEN sh -s - server \
      --disable-cloud-controller \
      --disable metrics-server \
      --server https://$FIRST_MASTER_PRIVATE_IP:6443 \
      --write-kubeconfig-mode=644 \
      --node-name="$(hostname -f)" \
      --cluster-cidr="10.244.0.0/16" \
      --etcd-expose-metrics=true \
      --kube-controller-manager-arg="address=0.0.0.0" \
      --kube-controller-manager-arg="bind-address=0.0.0.0" \
      --kube-proxy-arg="metrics-bind-address=0.0.0.0" \
      --kube-scheduler-arg="address=0.0.0.0" \
      --kube-scheduler-arg="bind-address=0.0.0.0" \
      --node-taint CriticalAddonsOnly=true:NoExecute \
      --kubelet-arg="cloud-provider=external" \
      --node-ip=$(hostname -I | awk '{print $2}') \
      --node-external-ip=$(hostname -I | awk '{print $1}') \
      --flannel-iface=ens10 \
      --tls-san="$(hostname -I | awk '{print $1}')" \
      --tls-san="$(hostname -I | awk '{print $2}')"

Agents

apt update && apt upgrade -y && apt install apparmor apparmor-utils -y

export K3S_TOKEN="[secret]"
export K3S_VERSION="v1.21.6+k3s1"
export MASTER_PRIVATE_IP="10.0.0.2"  # adjust

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=$K3S_VERSION K3S_TOKEN=$K3S_TOKEN sh -s - agent \
  --server https://$MASTER_PRIVATE_IP:6443 \
  --node-name="$(hostname -f)" \
  --kubelet-arg="cloud-provider=external" \
  --node-ip=$(hostname -I | awk '{print $2}') \
  --node-external-ip=$(hostname -I | awk '{print $1}') \
  --flannel-iface=ens10

You're welcome! :)