k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.08k stars 2.35k forks source link

Core workloads on v1.23.8+k3s1 and above see certificate related issues. `x509: certificate is valid for 127.0.0.1, 192.168.23.71, not 192.168.23.70` #5898

Closed LarsBingBong closed 2 years ago

LarsBingBong commented 2 years ago

Environmental Info: K3s Version:

k3s version v1.23.8+k3s2 (fe3cecc2)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

Linux test-test-master-0 5.13.0-37-generic #42~20.04.1-Ubuntu SMP Tue Mar 15 15:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Describe the bug: When deploying K3s v1.23.8+k3s2 or higher from the v1.23 channel we see certificate related errors in Cilium daemonset Pods as well as the metrics-server that comes with K3s. This has the effect that e.g. cilium status ... consider the cluster to not be fully working and that's a problem on the network backend side.

Cilium daemonset pod and cilium status output:

│ stream logs failed Get "https://192.168.23.70:10250/containerLogs/network/cilium-nf2gv/mount-bpf-fs?follow=true&tailLines=100&timestamps=true": x │
│ 509: certificate is valid for 127.0.0.1, 192.168.23.71, not 192.168.23.70 for network/cilium-nf2gv (mount-bpf-fs)

cilium status

    /¯¯\
 /¯¯\__/¯¯\    Cilium:         1 errors
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet         cilium             Desired: 6, Ready: 6/6, Available: 6/6
Containers:       cilium             Running: 6
                  cilium-operator    Running: 2
Cluster Pods:     7/7 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.12.0@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade: 6
                  cilium-operator    quay.io/cilium/operator-generic:v1.12.0@sha256:bb2a42eda766e5d4a87ee8a5433f089db81b72dd04acf6b59fcbb445a95f9410: 2
Errors:           cilium             cilium-nf2gv    unable to retrieve cilium status: error dialing backend: x509: certificate is valid for 127.0.0.1, 192.168.23.71, not 192.168.23.70

192.168.23.70 is the API IP.

metrics-server err. output:

E0722 07:38:57.894677       1 scraper.go:139] "Failed to scrape node" err="Get \"https://192.168.23.70:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 12 │
│ 7.0.0.1, 192.168.23.71, not 192.168.23.70" node="test-test-master-0"

Steps To Reproduce:

flags used

extra_master_args:
  - "--node-taint CriticalAddonsOnly=true:NoExecute"
  - "--data-dir=/k3s-data"
  - "--disable=coredns"
  - "--disable-cloud-controller"
  - "--disable-kube-proxy"
  - "--disable=local-storage"
  - "--disable-network-policy"
  - "--disable=servicelb"
  - "--disable=traefik"
  - "--kube-apiserver-arg=audit-log-path=/var/lib/rancher/audit/audit.log"
  - "--kube-apiserver-arg=audit-policy-file=/var/lib/rancher/audit/audit-policy.yaml"
  - "--kube-apiserver-arg=audit-webhook-config-file=/var/lib/rancher/audit/webhook-config.yaml"
  - "--kube-apiserver-arg=audit-log-maxage=30"
  - "--kube-apiserver-arg=audit-log-maxsize=20"
  - "--kube-apiserver-arg=audit-log-maxbackup=6"
  - "--kube-apiserver-arg=allow-privileged=true"
extra_worker_args:
  - "--node-label node.longhorn.io/create-default-disk=config"
  - "--kubelet-arg=feature-gates=GRPCContainerProbe=true"

CoreDNS is v1.9.3 and comes up successfully. Yes we generate the NodeHosts section of the kube-dns/k3s-CoreDNS right after having deployed the cluster as the internal CoreDNS workloads is disabled in the master args.

Expected behavior: Same behavior as on v1.23.6+k3s1 and below. In other words. That we don't see the wrong certificate served to workloads.

Actual behavior: It seems to be that that is the case - that workloads are served the wrong certificate. The one on port 10250 and not 6443.

As the cert. on 6443 includes the API IP in the SAN's and the one on 10250 do not. Which can be seen by browsing e.g. 192.168.23.70:10250 and 192.168.23.70:6443 >> then view the certificate details in the browser.

Additional context / logs:

And lo and behold. Now there's no error. Am I missing configuration needed in order to make it possible to disable the CCM? Isn't enough to specify --disable-cloud-controller to the masters? Or am I actually totally wrong in trying to disable the CCM if I'm not to replace it by some external CCM? I was of the impression that it isn't needed if we're not running on the cloud and therefore do not need specific integration to the cloud providers service plane. Am I wrong?

Thank you very much.

galal-hussein commented 2 years ago

@LarsBingBong Thanks for opening the issue, from the logs there seems to be a misconfigured kubelet certificate on the api node, according to the code the node 192.168.23.70 (The API Node) its kubelet should have cert configured with the right IP SANS https://github.com/k3s-io/k3s/blob/ce5b9347c928336cff13873d2ddeaaeb68d42322/pkg/server/router.go#L214-L232

can you get the following information to investigate more on whats causing the issue:

I am not sure if CCM is relevant here, since it only affects --node-external-ip flag, but private IP should be set automatically

brandond commented 2 years ago

It sounds like the 192.168.23.70 and 192.168.23.71 addresses are both present on the server, is that correct?

In addition to what @galal-hussein asked for, can you also provide the output of kubectl get node -o wide, a list of the IPs present on all of the nodes interfaces, and any load-balancer/VIP address configured in your environment and what they're load-balancing to?

LarsBingBong commented 2 years ago

I tested again with your great input. Much appreciated @brandond and @galal-hussein - and here's what I found when deploying a cluster when --disable-cloud-controller is set again.

Indeed *.23.70 and *.23.71 are on the same node. *.23.70 is the API IP. IT's a keepalived the API IP is handled via keepalived. That vip can float between the master/control-plane nodes. So *.23.71, .23.72` & .23.73 And yes, the two IPv4s are assigned to the same NIC on thecontrol-plane` nodes.

Below, the output you requested:

kubectl get node -o wide

NAME                 STATUS   ROLES                       AGE     VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
test-test-master-0   Ready    control-plane,etcd,master   7m18s   v1.23.8+k3s2   192.168.23.70   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1
test-test-master-1   Ready    control-plane,etcd,master   5m      v1.23.8+k3s2   192.168.23.72   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1
test-test-master-2   Ready    control-plane,etcd,master   5m16s   v1.23.8+k3s2   192.168.23.73   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1
test-test-worker-0   Ready    worker                      3m34s   v1.23.8+k3s2   192.168.23.77   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1
test-test-worker-1   Ready    worker                      3m35s   v1.23.8+k3s2   192.168.23.78   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1
test-test-worker-2   Ready    worker                      3m34s   v1.23.8+k3s2   192.168.23.81   <none>        Ubuntu 20.04.4 LTS   5.13.0-37-generic   containerd://1.5.13-k3s1

Output of: openssl x509 -in /k3s-data/server/tls/serving-kube-apiserver.crt -noout -text < I hope this is what you asked for/had in mind @galal-hussein

root@test-test-master-0:~# openssl x509 -in /k3s-data/server/tls/serving-kube-apiserver.crt -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 7993918466325878850 (0x6ef01a7dd4ca1842)
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: CN = k3s-server-ca@1658740937
        Validity
            Not Before: Jul 25 09:22:17 2022 GMT
            Not After : Jul 25 09:22:17 2023 GMT
        Subject: CN = kube-apiserver
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:be:0e:5a:8a:47:ae:be:28:64:1a:47:4d:c1:cd:
                    71:bd:dc:a1:c5:d6:03:19:42:36:2f:23:c4:37:25:
                    79:34:f6:6f:78:12:c3:c6:4e:c9:5f:f3:fc:16:7e:
                    c1:5a:da:20:fd:b6:e1:bf:68:0c:b3:dc:3c:bd:34:
                    51:7b:9f:ce:ba
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Authority Key Identifier: 
                keyid:99:48:6A:2C:6B:9F:D8:45:CA:13:76:A4:A4:6B:00:CE:13:45:D5:DA

            X509v3 Subject Alternative Name: 
                DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:localhost, DNS:test-test-master-0, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.23.71, IP Address:10.43.0.1
    Signature Algorithm: ecdsa-with-SHA256
         30:46:02:21:00:f1:9d:85:4b:90:01:16:2a:05:92:85:24:1b:
         8a:84:68:ba:42:38:f9:5a:43:21:c5:45:7d:5b:f5:2b:10:5a:
         6f:02:21:00:a3:c6:71:e5:c5:7e:a5:ca:e2:28:88:12:b4:45:
         99:60:71:84:17:95:95:87:4c:08:b9:31:60:fe:6b:1f:82:cd
root@test-test-master-0:~# 

I now see that the test-test-master-0 have its INTERNAL-IP to 192.168.23.70. That of course do not work well with the cert.

Just controlled kubectl get node -o wide on a K3s cluster on v1.22.5+k3s1 and there, the *-master-0 is not announcing the INTERNAL-IP to be the API IP. And on the node/control-plane that is the current owner of the keepalived floating vip - the INTERNAL_IP is the nodes IP and not the API IP.

Is it because we now need to use the --node-ip combined with maybe the --advertise-address to get this to work on K3s v1.23.8+k3s1++++. Now that we're trying to disable the use of the ccm - as we don't really need it in our current situation.

@galal-hussein how can I control, on a cluster using the ccm how it's effecting the --node-external-ip flag? As we're not configuring/setting this flag on any cluster.

Thank you 👍🏿

LarsBingBong commented 2 years ago

Had another run with deploying a K3s v1.23.8+k3s2 cluster with --disable-cloud-controller active. This time however I used the --node-ip on each control-plane node in order to specify the IPv4 that the agent should announce.

That didn't make any change relative to the flags we use when deploying K3s - for good measure the flags where now:

        - "--node-ip=192.168.23.71"
        - "--node-taint CriticalAddonsOnly=true:NoExecute"
        - "--data-dir=/k3s-data"
        - "--disable=coredns"
        - "--disable-cloud-controller"
        - "--disable-kube-proxy"
        - "--disable=local-storage"
        - "--disable-network-policy"
        - "--disable=servicelb"
        - "--disable=traefik"
        - "--kube-apiserver-arg=audit-log-path=/var/lib/rancher/audit/audit.log"
        - "--kube-apiserver-arg=audit-policy-file=/var/lib/rancher/audit/audit-policy.yaml"
        - "--kube-apiserver-arg=audit-webhook-config-file=/var/lib/rancher/audit/webhook-config.yaml"
        - "--kube-apiserver-arg=audit-log-maxage=30"
        - "--kube-apiserver-arg=audit-log-maxsize=20"
        - "--kube-apiserver-arg=audit-log-maxbackup=6"
        # To be set to [true]. So that the Cilium CNI & Falco can do their magic. And for the clusters that need, it KubeVirt to work 
        - "--kube-apiserver-arg=allow-privileged=true"

Thanks

galal-hussein commented 2 years ago

@LarsBingBong Thanks for providing the information, so as for your question:

@galal-hussein how can I control, on a cluster using the ccm how it's effecting the --node-external-ip flag? As we're not configuring/setting this flag on any cluster.

The kubelet on each node configures this automatically according to the ips and hostname on the nodes, for the externaladdress it only gets assigned by an external cloud provider, in this case I dont think its relevant to what are you trying to do.

Seeing now the problem I think its simply that the worker trying to communicate to an IP that is not listed in the SANs, I would like to test something, can you configure the node with the following:

  - "--node-ip=192.168.23.70"
  - "--node-ip=192.168.23.71"

This way the kubelet's serving cert should be configured with both.

LarsBingBong commented 2 years ago

Hi @galal-hussein,

Okay sure. Interesting. Thank you for further elaborating.

In regards to using --node-ip with both IPv4's specified. It seems logical to me that the --node-ip=192.168.23.70 AKA the API IP needs to be configured on all control-plane nodes - as that IP is floating/vip and handled by keepalived - in other words is drifting between them. Right?

Thank you

N.B. yes I'll surely try it out.

galal-hussein commented 2 years ago

@LarsBingBong yes for sure, I meant for this configuration --node-ip=x.x.x.70 is to be configured on all cp nodes

LarsBingBong commented 2 years ago

I tried that ....

The first master comes up ( the one with --cluster-init. Then the next two do not.

And here's the result from one of the failing master:

Jul 26 20:18:47 test-test-master-1 systemd[1]: k3s.service: Failed with result 'exit-code'.
Jul 26 20:18:47 test-test-master-1 systemd[1]: Failed to start Lightweight Kubernetes.
Jul 26 20:18:52 test-test-master-1 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 394.
Jul 26 20:18:52 test-test-master-1 systemd[1]: Stopped Lightweight Kubernetes.
Jul 26 20:18:52 test-test-master-1 systemd[1]: Starting Lightweight Kubernetes...
Jul 26 20:18:52 test-test-master-1 sh[103324]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 26 20:18:52 test-test-master-1 sh[103325]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Starting k3s v1.24.3+k3s1 (990ba0e8)"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Managed etcd cluster not yet initialized"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Reconciling bootstrap data between datastore and disk"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Running kube-apiserver --advertise-address=192.168.23.70 --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --audit-log-maxage=30 --audit-log-maxbackup=6 --audit-log-maxsize=20 --audit-log-path=/var/lib/rancher/audit/audit.log --audit-policy-file=/var/lib/rancher/audit/audit-policy.yaml --audit-webhook-config-file=/var/lib/rancher/audit/webhook-config.yaml --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/k3s-data/server/tls/temporary-certs --client-ca-file=/k3s-data/server/tls/client-ca.crt --egress-selector-config-file=/k3s-data/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --etcd-cafile=/k3s-data/server/tls/etcd/server-ca.crt --etcd-certfile=/k3s-data/server/tls/etcd/client.crt --etcd-keyfile=/k3s-data/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/k3s-data/server/tls/server-ca.crt --kubelet-client-certificate=/k3s-data/server/tls/client-kube-apiserver.crt --kubelet-client-key=/k3s-data/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/k3s-data/server/tls/client-auth-proxy.crt --proxy-client-key-file=/k3s-data/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/k3s-data/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/k3s-data/server/tls/service.key --service-account-signing-key-file=/k3s-data/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/k3s-data/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/k3s-data/server/tls/serving-kube-apiserver.key"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Tunnel server egress proxy mode: agent"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Running kube-scheduler --authentication-kubeconfig=/k3s-data/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/k3s-data/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/k3s-data/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/k3s-data/server/cred/controller.kubeconfig --authorization-kubeconfig=/k3s-data/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/k3s-data/server/tls/client-ca.crt --cluster-signing-kube-apiserver-client-key-file=/k3s-data/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/k3s-data/server/tls/client-ca.crt --cluster-signing-kubelet-client-key-file=/k3s-data/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/k3s-data/server/tls/server-ca.crt --cluster-signing-kubelet-serving-key-file=/k3s-data/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/k3s-data/server/tls/server-ca.crt --cluster-signing-legacy-unknown-key-file=/k3s-data/server/tls/server-ca.key --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/k3s-data/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/k3s-data/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/k3s-data/server/tls/service.key --use-service-account-credentials=true"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Node token is available at /k3s-data/server/token"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="To join node to cluster: k3s agent -s https://192.168.23.72:6443 -t ${NODE_TOKEN}"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Run: k3s kubectl"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="certificate CN=test-test-master-1 signed by CN=k3s-server-ca@1658849056: notBefore=2022-07-26 15:24:16 +0000 UTC notAfter=2023-07-26 18:18:52 +0000 UTC"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="certificate CN=system:node:test-test-master-1,O=system:nodes signed by CN=k3s-client-ca@1658849056: notBefore=2022-07-26 15:24:16 +0000 UTC notAfter=2023-07-26 18:18:52 +0000 UTC"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Module overlay was already loaded"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Module nf_conntrack was already loaded"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Module br_netfilter was already loaded"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Module iptable_nat was already loaded"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Logging containerd to /k3s-data/agent/containerd/containerd.log"
Jul 26 20:18:52 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:52+02:00" level=info msg="Running containerd -c /k3s-data/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /k3s-data/agent/containerd"
Jul 26 20:18:53 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:53+02:00" level=info msg="Containerd is now running"
Jul 26 20:18:53 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:53+02:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/k3s-data/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=test-test-master-1 --kubeconfig=/k3s-data/agent/kubelet.kubeconfig --node-ip=192.168.23.70 --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/k3s-data/agent/pod-manifests --read-only-port=0 --register-with-taints=CriticalAddonsOnly=true:NoExecute --resolv-conf=/run/systemd/resolve/resolv.conf --serialize-image-pulls=false --tls-cert-file=/k3s-data/agent/serving-kubelet.crt --tls-private-key-file=/k3s-data/agent/serving-kubelet.key"
Jul 26 20:18:53 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:53+02:00" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
Jul 26 20:18:53 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:53+02:00" level=info msg="Handling backend connection request [test-test-master-1]"
Jul 26 20:18:53 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:53+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Jul 26 20:18:57 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:57+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jul 26 20:18:58 test-test-master-1 k3s[103328]: time="2022-07-26T20:18:58+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Jul 26 20:19:02 test-test-master-1 k3s[103328]: {"level":"warn","ts":"2022-07-26T20:19:02.335+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000cb8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Jul 26 20:19:02 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:02+02:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Jul 26 20:19:02 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:02+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jul 26 20:19:03 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:03+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Jul 26 20:19:07 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:07+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jul 26 20:19:08 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:08+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
Jul 26 20:19:12 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:12+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jul 26 20:19:13 test-test-master-1 k3s[103328]: {"level":"warn","ts":"2022-07-26T20:19:13.390+0200","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00066ea80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Jul 26 20:19:13 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:13+02:00" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"
Jul 26 20:19:13 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:13+02:00" level=info msg="Starting etcd to join cluster with members [test-test-master-0-b1b10f3e=https://192.168.23.70:2380 test-test-master-1-26b71735=https://192.168.23.70:2380]"
Jul 26 20:19:13 test-test-master-1 k3s[103328]: {"level":"info","ts":"2022-07-26T20:19:13.392+0200","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["https://127.0.0.1:2380","https://192.168.23.70:2380"]}
Jul 26 20:19:13 test-test-master-1 k3s[103328]: {"level":"info","ts":"2022-07-26T20:19:13.392+0200","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /k3s-data/server/tls/etcd/peer-server-client.crt, key = /k3s-data/server/tls/etcd/peer-server-client.key, client-cert=, client-key=, trusted-ca = /k3s-data/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
Jul 26 20:19:13 test-test-master-1 k3s[103328]: {"level":"info","ts":"2022-07-26T20:19:13.392+0200","caller":"embed/etcd.go:368","msg":"closing etcd server","name":"test-test-master-1-26b71735","data-dir":"/k3s-data/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.23.70:2379"]}
Jul 26 20:19:13 test-test-master-1 k3s[103328]: {"level":"info","ts":"2022-07-26T20:19:13.392+0200","caller":"embed/etcd.go:370","msg":"closed etcd server","name":"test-test-master-1-26b71735","data-dir":"/k3s-data/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.23.70:2379"]}
Jul 26 20:19:13 test-test-master-1 k3s[103328]: time="2022-07-26T20:19:13+02:00" level=fatal msg="ETCD join failed: listen tcp 192.168.23.70:2380: bind: cannot assign requested address"
Jul 26 20:19:13 test-test-master-1 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Jul 26 20:19:13 test-test-master-1 systemd[1]: k3s.service: Failed with result 'exit-code'.
Jul 26 20:19:13 test-test-master-1 systemd[1]: Failed to start Lightweight Kubernetes.

Also note .. full flags list to the K3s server binary.

image

N.B. yes I had a go with K3s v1.24.3+k3s1

I hope the above can give us something.

Thanks

brandond commented 2 years ago

In regards to using --node-ip with both IPv4's specified. It seems logical to me that the --node-ip=192.168.23.70 AKA the API IP needs to be configured on all control-plane nodes - as that IP is floating/vip and handled by keepalived - in other words is drifting between them. Right?

for this configuration --node-ip=x.x.x.70 is to be configured on all cp nodes

I don't think this will work. Kubernetes does not support multiple nodes having the same internal or external IP address. If you're going to use a floating VIP with keepalived, Kubernetes needs to be essentially unaware of it. Don't use it for the internal or external IP; make sure that it's not picked up by any address auto-selection. The only reference to it should be in the --tls-san.

A better way to do this might be to simply configure a DNS alias that points to active control-plane nodes, and use that as the fixed registration endpoint. Using keepalived just to support the fixed registration endpoint is probably overkill, as the nodes load-balance between servers using a client load-balancer once they're joined to the cluster. The registration endpoint (--server address) is essentially unused after the initial join workflow is done.

LarsBingBong commented 2 years ago

Hi @brandond,

Thank you for further elaborating.

How much of this is totally specific to K3s?

Before bumping into the issue described here we did not specify the API IP in any way to Kubernetes. It's "just" registered as an extra IPv4 on the same NIC as the one being assigned the nodes external IPv4.

What mechanisms are in play in regards to the address auto-selection you are mentioning? I tried --tls-san=192.168.23.70 when configuring the control-plane nodes. The error still occurred. I mention that in the initial post here.

We use a tool called kcli to deploy K3s and the underlying VM's. And he's talking about using kube-vip instead of keepalived. Do you see that as a way to go?

What's really interesting - I think - is that when we don't disable the CCM things works. What's the down low on that being the case.

Thank you very much.

brandond commented 2 years ago

How much of this is totally specific to K3s?

Not much of it. The kubelet has logic to detect the node's primary internal IP based on which interface the default route is associated with. External IPs pretty much always need to be set by an external integration.

when we don't disable the CCM things works.

The CCM is what's responsible for setting the node addresses based on the configured node-ip and node-external-ip values. If you disable it, the internal IP will be set, but the external IP will not. New nodes added to the cluster will also remain tainted as uninitialized due to lack of a cloud provider.

Under no circumstances should you ever have multiple nodes in the cluster with the same internal or external address. That is not an expected configuration.

LarsBingBong commented 2 years ago

Hi @brandond,

Again thank you. Hmm so reading https://kubernetes.io/docs/concepts/architecture/cloud-controller/ leads me to think that I can't disable the CCM that comes out-of-the-box with K3s. Basically what I'm trying to accomplish is having the CCM disabled as I was of the belief that it isn't needed now that we're not using a cloud provider. If that's a disbelief so be it :-) ....

When things work we:

N.B.: Having several nodes with the same IPv4 and that failing makes sense yes. Tried it because it was suggested.

brandond commented 2 years ago

I was of the belief that it isn't needed now that we're not using a cloud provider.

Kubernetes really, really expects to have a cloud provider active to handle configuring the nods properly. There are many things the kubelet can't do for itself. In bare-metal environments, or other situations where you don't want to integrate with a "real" cloud provider such as AWS, GKE, Azure, you need something like K3s' embedded stub cloud-provider. Really the only time you would ever want to disable it is when you're deploying a real cloud-provider chart instead.

LarsBingBong commented 2 years ago

What a n00b I've been here. But, super great to learn this. Even though it's been the hard way ;-). Had a dive into the cloudControllerManager func. > https://github.com/k3s-io/k3s/blob/master/pkg/daemons/control/server.go#L299 and yeah ... I can see different low-level network wise is going on. Good to know and thank you for the patience! Much appreciated.

Clearly this issue can be closed as resolved.

Disabling the CCM was basically the issue in this issue as it had several negative side-effects.

Thank you @brandond and @galal-hussein