k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.8k stars 2.33k forks source link

HA setup over Tailscale #10133

Closed dmorn closed 4 months ago

dmorn commented 4 months ago

Environmental Info: K3s Version:

k3s version v1.29.4+k3s1 (94e29e2e) go version go1.21.9

Node(s) CPU architecture, OS, and Version: Linux control-cax21-fsn1 5.10.0-29-arm64 #1 SMP Debian 5.10.216-1 (2024-05-03) aarch64 GNU/Linux Linux control-cax21-nbg1 5.10.0-28-arm64 #1 SMP Debian 5.10.209-2 (2024-01-31) aarch64 GNU/Linu

Cluster Configuration:

To reproduce, 2 control nodes

Describe the bug:

I'm using the VPN feature. I can add agent nodes, but I cannot add server ones for an HA setup.

Steps To Reproduce:

Expected behavior:

I expect nodes to just join the cluster

Actual behavior:

The second control node keeps on crashing

Additional context / logs:

Logs from the second control plane node.

May 22 10:33:44 control-cax21-nbg1 sh[210135]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
May 22 10:33:44 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:44Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
May 22 10:33:44 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:44Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/381112d65aad62fd1acd373e1fc0c430cb9c3fc77232ffd864b8532a77aef54d"
May 22 10:33:45 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:45Z" level=info msg="Starting VPN: tailscale"
May 22 10:33:45 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:45Z" level=info msg="Changed advertise-address to 100.83.170.3 due to VPN"
May 22 10:33:45 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:45Z" level=warning msg="Etcd IP (PrivateIP) remains the local IP. Running etcd traffic over VPN is not recommended due to performance issues"
May 22 10:33:45 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:45Z" level=info msg="Starting k3s v1.29.4+k3s1 (94e29e2e)"
May 22 10:33:45 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:45Z" level=info msg="Managed etcd cluster not yet initialized"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Reconciling bootstrap data between datastore and disk"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:admin,O=system:masters signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:k3s-supervisor,O=system:masters signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:kube-controller-manager signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:kube-scheduler signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:apiserver,O=system:masters signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:kube-proxy signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:k3s-controller signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=k3s-cloud-controller-manager signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=kube-apiserver signed by CN=k3s-server-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=system:auth-proxy signed by CN=k3s-request-header-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=etcd-client signed by CN=etcd-server-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=etcd-peer signed by CN=etcd-peer-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=etcd-server signed by CN=etcd-server-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=warning msg="dynamiclistener [::]:6443: no cached certificate available for preload - deferring certificate load until storage initialization or first client request"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Active TLS secret / (ver=) (count 12): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-100.83.170.3:100.83.170.3 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-23.88.104.207:23.88.104.207 listener.cattle.io/cn-2a01_4f8_1c1e_87ec__1-9ec988:2a01:4f8:1c1e:87ec::1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-control-cax21-nbg1:control-cax21-nbg1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=F299B025E01E50A339FB16D85A899F7E8142574C]"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg=start
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="schedule, now=2024-05-22T10:33:46Z, entry=1, next=2024-05-22T12:00:00Z"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Running kube-apiserver --advertise-address=100.83.170.3 --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --egress-selector-config-file=/var/lib/rancher/k3s/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --enable-bootstrap-token-auth=true --etcd-cafile=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/k3s/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/k3s/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --configure-cloud-routes=false --controllers=*,tokencleaner,-service,-route,-cloud-node-lifecycle --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-config=/var/lib/rancher/k3s/server/etc/cloud-config.yaml --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --controllers=*,-route --feature-gates=CloudDualStackNodeIPs=true --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect-resource-name=k3s-cloud-controller-manager --node-status-update-frequency=1m0s --profiling=false"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Server node token is available at /var/lib/rancher/k3s/server/token"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="To join server node to cluster: k3s server -s https://23.88.104.207:6443 -t ${SERVER_NODE_TOKEN}"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Agent node token is available at /var/lib/rancher/k3s/server/agent-token"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="To join agent node to cluster: k3s agent -s https://23.88.104.207:6443 -t ${AGENT_NODE_TOKEN}"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Run: k3s kubectl"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Node-ip changed to [100.83.170.3 fd7a:115c:a1e0::2401:aa03] due to VPN"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="Password verified locally for node control-cax21-nbg1"
May 22 10:33:46 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:46Z" level=info msg="certificate CN=control-cax21-nbg1 signed by CN=k3s-server-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:46 +0000 UTC"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="certificate CN=system:node:control-cax21-nbg1,O=system:nodes signed by CN=k3s-client-ca@1716372118: notBefore=2024-05-22 10:01:58 +0000 UTC notAfter=2025-05-22 10:33:47 +0000 UTC"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Module overlay was already loaded"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Module nf_conntrack was already loaded"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Module br_netfilter was already loaded"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Module iptable_nat was already loaded"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Module iptable_filter was already loaded"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
May 22 10:33:47 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:47Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="containerd is now running"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Creating k3s-cert-monitor event broadcaster"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=control-cax21-nbg1 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=100.83.170.3,fd7a:115c:a1e0::2401:aa03 --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Handling backend connection request [control-cax21-nbg1]"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Remotedialer connected to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 22 10:33:48 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:48Z" level=info msg="Adding member control-cax21-nbg1-df0d918d=https://23.88.104.207:2380 to etcd cluster [control-cax21-fsn1-3b956cb0=https://142.132.176.81:2380]"
May 22 10:33:53 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:53Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 22 10:33:58 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:33:58Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 22 10:34:03 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:34:03Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error"
May 22 10:34:08 control-cax21-nbg1 k3s[210139]: {"level":"warn","ts":"2024-05-22T10:34:08.562614Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x40007cb880/142.132.176.81:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
May 22 10:34:08 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:34:08Z" level=fatal msg="etcd cluster join failed: context deadline exceeded"
May 22 10:34:08 control-cax21-nbg1 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
brandond commented 4 months ago

See https://docs.k3s.io/networking/distributed-multicloud:

Embedded etcd is not supported in this type of deployment. If using embedded etcd, all server nodes must be reachable to each other via their private IPs. Agents may be distributed over multiple networks, but all servers should be in the same location.

All etcd nodes must be on the same private network.

dmorn commented 4 months ago

Hi @brandond! They are.

brandond commented 4 months ago

OK, but can they reach each other at their private IPs? It appears they cannot based on your logs:

May 22 10:34:08 control-cax21-nbg1 k3s[210139]: {"level":"warn","ts":"2024-05-22T10:34:08.562614Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x40007cb880/142.132.176.81:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
May 22 10:34:08 control-cax21-nbg1 k3s[210139]: time="2024-05-22T10:34:08Z" level=fatal msg="etcd cluster join failed: context deadline exceeded"
May 22 10:34:08 control-cax21-nbg1 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

Are you using public IPs as the nodes private addresses?

dmorn commented 4 months ago

Nope, that's the thing. I'm setting the node-external-ip and not the node-ip as the logs say that value is overridden by VPN configuration. Do I need to set the node-ip as well?

brandond commented 4 months ago

Do you have any idea why the nodes wouldn't be able to reach each other at the selected addresses? Do you have firewall rules or something else in place that is blocking the etcd traffic?

dmorn commented 4 months ago

Yes I do have an idea. The nodes are trying to use the external address to comunicate and yes, that traffic is not allowed by firewall rules! Setting the node-ip in the previous sessions I tried didn't seem to help, but I would have to check it out again. The idea as I understand would be to