k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.05k stars 2.35k forks source link

Initial "v1" HA Support #618

Closed davidnuzik closed 5 years ago

davidnuzik commented 5 years ago

For tracking what "v1" of HA will support for v0.7.0 upcoming release.

schmitch commented 5 years ago

as a user it would also be a good idea to know "how it works" or at least the basics, not that you kill master-server X and your cluster broke after a reboot.

erikwilson commented 5 years ago

For the initial version of HA k3s must:

To bootstrap the HA master nodes the appropriate server certs, keys, and password entries should be copied to all nodes. A flag is provided to assist, by passing --bootstrap full the server will attempt to read bootstrap data from etcd, if none exists it will be created, and then certs are written to etcd if they don't already exist. Using --bootstrap read will only fetch data from the etcd server and error if none exists, and --bootstrap write will always write bootstrap data to the etcd server but not read. The default is to perform no bootstrap operation.

The default advertise address also changes with v0.7.0, where the node-ip is used rather than using a proxy through 127.0.0.1. We also now provide --advertise-address and --advertise-port server flags if you want it set to something other than the node-ip:6443.

When the agent connects to the server it will iterate through all of the kubernetes endpoints and connect a reverse tunnel to the master nodes so they can communicate with the worker node. The kubernetes endpoints are watched and any changes will result in disconnections from old nodes or connecting to new nodes.

Future plans include setting up a load balancing proxy on each node to eliminate the need for an external load balancer, and provide mysql & postgres behind an etcd3 interface.

To test I provisioned 7 nodes. The hostname on each node was set to be unique.

The first node is a dedicated instance for etcd.

The next three nodes were master nodes. An external load balancer was setup for these nodes on port 6443. The etcd certs were copied to the nodes and launched with the same command to bootstrap k8s data:

curl -sfL https://get.k3s.io | \
   K3S_CLUSTER_SECRET=ha-test \
   INSTALL_K3S_VERSION=v0.7.0-rc9 \
   sh -s - \
     --storage-backend etcd3 \
     --storage-endpoint https://192.168.101.99:2379 \
     --storage-cafile `pwd`/cfssl/ca.pem \
     --storage-certfile `pwd`/cfssl/client.pem \
     --storage-keyfile `pwd`/cfssl/client-key.pem \
     --tls-san 192.168.101.135 \
     --bootstrap full

Note: if an ip address is used instead of a hostname for a load balancer that ip should be included with a tls-san flag when launching the server.

Another three agents were created, connecting to the hostname of the master nodes load balancer:

curl -sfL https://get.k3s.io |  \
    K3S_CLUSTER_SECRET=ha-test \
    INSTALL_K3S_VERSION=v0.7.0-rc9 \
    sh -s - agent \
       --server https://nb-192-168-101-135.fremont.nodebalancer.linode.com:6443

The endpoints for the master node ips are setup:

root@master-1:~# kubectl get endpoints
NAME         ENDPOINTS                                                 AGE
kubernetes   104.237.155.186:6443,45.33.34.13:6443,45.33.37.148:6443   15m

Then viewing the nodes that are available from all masters:

root@master-1:~# kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
agent-1    Ready    worker   21m   v1.14.4-k3s.1
agent-2    Ready    worker   21m   v1.14.4-k3s.1
agent-3    Ready    worker   21m   v1.14.4-k3s.1
master-1   Ready    master   23m   v1.14.4-k3s.1
master-2   Ready    master   23m   v1.14.4-k3s.1
master-3   Ready    master   23m   v1.14.4-k3s.1

And after installing metrics-server:

root@master-1:~# kubectl top nodes
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
agent-1    24m          1%     561Mi           14%
agent-2    29m          1%     514Mi           13%
agent-3    22m          1%     481Mi           12%
master-1   50m          2%     797Mi           20%
master-2   34m          1%     708Mi           17%
master-3   39m          1%     671Mi           17%

After stopping k3s on master-1 & master-2:

root@master-1:~# kubectl get endpoints
NAME         ENDPOINTS           AGE
kubernetes   45.33.37.148:6443   33m
root@master-1:~# kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
agent-1    Ready      worker   31m   v1.14.4-k3s.1
agent-2    Ready      worker   31m   v1.14.4-k3s.1
agent-3    Ready      worker   31m   v1.14.4-k3s.1
master-1   NotReady   master   33m   v1.14.4-k3s.1
master-2   NotReady   master   33m   v1.14.4-k3s.1
master-3   Ready      master   33m   v1.14.4-k3s.1
jwillmer commented 5 years ago

Can't the preinstalled traefik not act as the load balancer? Also how do I apply the --storage- configuration if I have already setup my k3s?