Cannot start HA/embedded etcd through NAT

optix2000 commented 3 years ago

Environmental Info: K3s Version:

k3s version v1.20.2+k3s1 (1d4adb03)
go version go1.15.5

Node(s) CPU architecture, OS, and Version: Stock Debian 10 Linux k3s2 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux

Cluster Configuration: 3 masters. 1 in a NAT with all traffic forwarded, the others have public IPs.

Host A:

VPS with weird networking
Host behind NAT
Private IP (10.x.x.x) IP on interface
Has publicly routable IP with all traffic NATed

Host B/C:

Standard VPS
Public IP on interface

Describe the bug: Etcd fails to connect as it tries to use Host A's internal IP and softlocks k3s in an infinite loop.

Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:19.777Z","caller":"etcdserver/cluster_util.go:76","msg":"failed to get cluster response","address":"https://10.64.244.91:2380/members","error":"Get \"https://10.64.244.91:2380/members\": dial tcp 10.64.244.91:2380: i/o timeout"}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:19.778Z","caller":"embed/etcd.go:363","msg":"closing etcd server","name":"k3s2-21335ec2","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://144.202.111.132:2379"]}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:19.779Z","caller":"embed/etcd.go:367","msg":"closed etcd server","name":"k3s2-21335ec2","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://144.202.111.132:2379"]}

As an aside, etcd never retries as the server shuts down, which could cause issues if there's a transient network issue.

Steps To Reproduce:

Installed K3s on Host A: curl -sfL https://get.k3s.io | K3S_TOKEN=test sh -s - --cluster-init
Installed K3s on Host B: curl -sfL https://get.k3s.io | K3S_TOKEN=test sh -s - --server https://<host-A-external-IP>:6443

Trying to pass --node-external-ip <host-A-IP> on Host A doesn't affect etcd.

Reversing the roles (ie Host B cluster-init, Host A connect), doesn't work either, and results in a slightly different error

Expected behavior: etcd bootstraps uses external IP of Host A instead of internal IP.

Actual behavior: etcd tries to bootstrap using internal IP of Host A and fails to connect (as it's an internal IP).

Additional context / logs: Host B logs:

Jan 23 10:13:49 k3s2 systemd[1]: Starting Lightweight Kubernetes...
Jan 23 10:13:49 k3s2 k3s[9216]: time="2021-01-23T10:13:49Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Jan 23 10:13:49 k3s2 k3s[9216]: time="2021-01-23T10:13:49Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/8f4b194129852507eab4a55117fc942e0688ec9a70ffdaa5911ccc6652220f76"
Jan 23 10:13:51 k3s2 k3s[9216]: time="2021-01-23T10:13:51.646991720Z" level=info msg="Starting k3s v1.20.2+k3s1 (1d4adb03)"
Jan 23 10:13:52 k3s2 k3s[9216]: time="2021-01-23T10:13:52.852626481Z" level=info msg="Managed etcd cluster not yet initialized"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.302173717Z" level=info msg="certificate CN=system:admin,O=system:masters signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.303310644Z" level=info msg="certificate CN=system:kube-controller-manager signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.304088771Z" level=info msg="certificate CN=system:kube-scheduler signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.304858188Z" level=info msg="certificate CN=kube-apiserver signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.305607942Z" level=info msg="certificate CN=system:kube-proxy signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.306327781Z" level=info msg="certificate CN=system:k3s-controller signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.307009780Z" level=info msg="certificate CN=cloud-controller-manager signed by CN=k3s-client-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.307866060Z" level=info msg="certificate CN=kube-apiserver signed by CN=k3s-server-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.308625557Z" level=info msg="certificate CN=system:auth-proxy signed by CN=k3s-request-header-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.309501071Z" level=info msg="certificate CN=etcd-server signed by CN=etcd-server-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.310114119Z" level=info msg="certificate CN=etcd-client signed by CN=etcd-server-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.310798291Z" level=info msg="certificate CN=etcd-peer signed by CN=etcd-peer-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.314596648Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1611396783: notBefore=2021-01-23 10:13:03 +0000 UTC notAfter=2022-01-23 10:13:53 +0000 UTC"
Jan 23 10:13:53 k3s2 k3s[9216]: time="2021-01-23T10:13:53.315058588Z" level=info msg="Active TLS secret  (ver=) (count 7): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-144.202.111.132:144.202.111.132 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=6054879B80E51EF66AA88E51BFF27DB275633C62]"
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:13.757Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-65ce3af4-f120-4b3c-81ff-34b3771b3df5/10.64.244.91:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.64.244.91:2379: i/o timeout\""}
Jan 23 10:14:13 k3s2 k3s[9216]: time="2021-01-23T10:14:13.757637368Z" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"
Jan 23 10:14:13 k3s2 k3s[9216]: time="2021-01-23T10:14:13.757680652Z" level=info msg="Starting etcd for cluster [k3s1-13136390=https://10.64.244.91:2380 k3s2-21335ec2=https://144.202.111.132:2380]"
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:13.769Z","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["https://144.202.111.132:2380"]}
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:13.770Z","caller":"embed/etcd.go:468","msg":"starting with peer TLS","tls-info":"cert = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key, trusted-ca = /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:13.770Z","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://144.202.111.132:2379"]}
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:13.771Z","caller":"embed/etcd.go:302","msg":"starting an etcd server","etcd-version":"3.4.13","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.15.5","go-os":"linux","go-arch":"amd64","max-cpu-set":1,"max-cpu-available":1,"member-initialized":false,"name":"k3s2-21335ec2","data-dir":"/var/lib/rancher/k3s/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/k3s/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["https://144.202.111.132:2380"],"advertise-client-urls":["https://144.202.111.132:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://144.202.111.132:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"k3s1-13136390=https://10.64.244.91:2380,k3s2-21335ec2=https://144.202.111.132:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":false,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":""}
Jan 23 10:14:13 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:13.776Z","caller":"etcdserver/backend.go:80","msg":"opened backend db","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","took":"4.638771ms"}
Jan 23 10:14:18 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:18.315Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Jan 23 10:14:18 k3s2 k3s[9216]: time="2021-01-23T10:14:18.316532671Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Jan 23 10:14:18 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:18.316Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: authentication handshake failed: context canceled\". Reconnecting..."}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:19.777Z","caller":"etcdserver/cluster_util.go:76","msg":"failed to get cluster response","address":"https://10.64.244.91:2380/members","error":"Get \"https://10.64.244.91:2380/members\": dial tcp 10.64.244.91:2380: i/o timeout"}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:19.778Z","caller":"embed/etcd.go:363","msg":"closing etcd server","name":"k3s2-21335ec2","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://144.202.111.132:2379"]}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"info","ts":"2021-01-23T10:14:19.779Z","caller":"embed/etcd.go:367","msg":"closed etcd server","name":"k3s2-21335ec2","data-dir":"/var/lib/rancher/k3s/server/db/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://144.202.111.132:2379"]}
Jan 23 10:14:19 k3s2 k3s[9216]: time="2021-01-23T10:14:19.779865470Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-cafile=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/k3s/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/k3s/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --feature-gates=ServiceAccountIssuerDiscovery=false --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=k3s --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:19.780Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://10.64.244.91:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 10.64.244.91:2379: operation was canceled\". Reconnecting..."}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:19.780Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: authentication handshake failed: read tcp 127.0.0.1:53766->127.0.0.1:2379: read: connection reset by peer\". Reconnecting..."}
Jan 23 10:14:19 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:19.781Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\". Reconnecting..."}
Jan 23 10:14:20 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:20.781Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\". Reconnecting..."}
Jan 23 10:14:22 k3s2 k3s[9216]: {"level":"warn","ts":"2021-01-23T10:14:22.099Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\". Reconnecting..."}
...

brandond commented 3 years ago

That is correct. At the moment, all etcd nodes must be able to reach each other via their internal IPs. Due etcd's requirement for low latency between nodes, we don't generally expect to see it deployed in a distributed configuration.

optix2000 commented 3 years ago

I'm a bit curious about this. Since this prevents using the embedded etcd for regional HA. This also prevents odd network configurations from working, even if they are low latency <5ms.

optix2000 commented 3 years ago

Due etcd's requirement for low latency between nodes, we don't generally expect to see it deployed in a distributed configuration.

Where does etcd require this? From their docs their default heartbeat interval is 100ms, which in turn allows for 50-150ms of latency between etcd nodes.

brandond commented 3 years ago

At some point we might extend it so that the server can also advertise the external IPs in the client and peer endpoint lists, but for the initial implementation we wanted to keep things pretty basic and easy to support. This means traditional lan deployments only.

Tchoupinax commented 3 years ago

Hello @brandond,

This means k3s will never support multi-region cluster either external DB ? (because of the latency)

brandond commented 3 years ago

This is just for embedded etcd not external databases. You're welcome to try whatever you like, but if the latency is high, performance will probably suffer.

stale[bot] commented 3 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

k3s-io / k3s

Cannot start HA/embedded etcd through NAT #2850