Open allnightlong opened 1 month ago
This indicates that the wireguard mesh between nodes isn't functioning properly, and DNS traffic between the affected node, and the node running the coredns pod, is being dropped. Ensure that you've opened all the correct ports for wireguard, and that you have node external-IPs set correctly for wireguard to correctly establish the mesh between nodes.
Hi, @brandond , thank you for the answer.
Here is my cluster state:
k get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
core Ready control-plane,core,master 15d v1.30.4+k3s1 10.0.1.4 146.185.xxx.xxx Ubuntu 24.04.1 LTS 6.8.0-41-generic containerd://1.7.20-k3s1
node-iota Ready node 8d v1.30.4+k3s1 10.0.1.2 <none> Ubuntu 24.04.1 LTS 6.8.0-41-generic containerd://1.7.20-k3s1
node-kappa Ready node 22h v1.30.4+k3s1 10.0.1.99 109.120.xxx.xx Ubuntu 24.04.1 LTS 6.8.0-44-generic containerd://1.7.20-k3s1
node-lambda Ready node 22h v1.30.4+k3s1 10.0.1.98 109.120.xxx.xx Ubuntu 24.04.1 LTS 6.8.0-44-generic containerd://1.7.20-k3s1
node-theta Ready node 8d v1.30.4+k3s1 10.0.1.8 <none> Ubuntu 24.04.1 LTS 6.8.0-41-generic containerd://1.7.20-k3s1
The main node core
and nodes node-iota
and note-theta
are in dc1. Nodes node-kappa
and node-lambda
are in dc2.
I'm checking connectivity, according the page - https://docs.k3s.io/installation/requirements#networking.
From core
node I'm able to connect to node-labmda
by TCP to port 10250
and by UDP to port 51820
.
From node-labmda
I can connect to core
by TCP port 6443
, TCP port 10250
and UDP port 51820
.
Here is my config for core
server node:
cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
server \
'--tls-san' \
'core.xxx.cloud' \
'--node-external-ip=146.185.xxx.xxx' \
'--flannel-backend=wireguard-native' \
'--flannel-external-ip' \
'--bind-address=0.0.0.0' \
'--kubelet-arg=allowed-unsafe-sysctls=net.ipv6.*' \
'--kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*' \
Here is my config for node-lambda
agent node:
cat /etc/systemd/system/k3s-agent.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
agent \
'--kubelet-arg=allowed-unsafe-sysctls=net.ipv6.*' \
'--kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*' \
'--node-ip=10.0.1.98' \
'--node-external-ip=109.120.xxx.xx' \
TBH, not sure in which direction should I go at this point, so any suggestions are welcome.
@manuelbuil do you have any tips on how to check wireguard connectivity between nodes?
@allnightlong could you run the following commands:
1 - Install wireguard-tools and then execute sudo wg
in the node where dnsutils is running
2 - Search for the IP of the coredns pod ($COREDNSIP) and then execute: kubectl exec -i -t dnsutils -- nslookup goo.gl $COREDNSIP
and see if that works
3 - Can you ping $COREDNSIP from the node where dnsutils is running?
Hi, @manuelbuil, thank you for the answers, here is my cluster's state:
node-lambda
(in datacenter 2) I execute sudo wg
:
sudo wg
interface: flannel-wg
public key: UxoKiZzDtXIwVgpYKXSucgqm52oB+k4GT2LjDK6t0mI=
private key: (hidden)
listening port: 51820
peer: DIMwbxQYU3uKGxnLrY0N4/hp9u9oAvQg/dQOJAYLiVk= endpoint: 146.185.xxx.xxx:51820 allowed ips: 10.42.0.0/24 latest handshake: 24 seconds ago transfer: 18.61 MiB received, 17.27 MiB sent persistent keepalive: every 25 seconds
peer: +wGbtSsm5PDnDPB9N6n/SlKi3aeiKi2gsgEyeQBs7Wc= endpoint: 109.120.xxx.xxx:51820 allowed ips: 10.42.8.0/24 latest handshake: 1 minute, 40 seconds ago transfer: 221.55 KiB received, 303.81 KiB sent persistent keepalive: every 25 seconds
146.185.xxx.xxx - is a `core` server node (datacenter 1).
109.120.xxx.xxx:51820 0 is a `node-kappa` agent node (datacenter 2).
2. I've got `dnsutils` pod running on `node-lambda` (datacenter 2)
kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128 ;; communications error to 10.42.6.128#53: timed out ;; communications error to 10.42.6.128#53: timed out ;; communications error to 10.42.6.128#53: timed out ;; no servers could be reached
command terminated with exit code 1
if I run `dnsutils` on `node-iota` (datacenter 1), the connection is ok
kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128
Server: 10.42.6.128
Address: 10.42.6.128#53
Non-authoritative answer: Name: goo.gl Address: 64.233.165.138 Name: goo.gl Address: 64.233.165.113 Name: goo.gl Address: 64.233.165.100 Name: goo.gl Address: 64.233.165.101 Name: goo.gl Address: 64.233.165.139 Name: goo.gl Address: 64.233.165.102 Name: goo.gl Address: 2a00:1450:4010:c08::66 Name: goo.gl Address: 2a00:1450:4010:c08::64 Name: goo.gl Address: 2a00:1450:4010:c08::71 Name: goo.gl Address: 2a00:1450:4010:c08::65
3. PING
ping 10.42.6.128 PING 10.42.6.128 (10.42.6.128) 56(84) bytes of data. From 10.42.9.0 icmp_seq=1 Destination Host Unreachable ping: sendmsg: Required key not available From 10.42.9.0 icmp_seq=2 Destination Host Unreachable ping: sendmsg: Required key not available From 10.42.9.0 icmp_seq=3 Destination Host Unreachable ping: sendmsg: Required key not available From 10.42.9.0 icmp_seq=4 Destination Host Unreachable ping: sendmsg: Required key not available From 10.42.9.0 icmp_seq=5 Destination Host Unreachable ping: sendmsg: Required key not available From 10.42.9.0 icmp_seq=6 Destination Host Unreachable ping: sendmsg: Required key not available ^C --- 10.42.6.128 ping statistics --- 6 packets transmitted, 0 received, +6 errors, 100% packet loss, time 5146ms
Run those tests on all the nodes. You need full connectivity between all cluster members, since the coredns pod may run on any node.
you are right, @brandond , coredns
pod is on node-iota
but I can connect to it from node-theta
(dc1)
kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128
Server: 10.42.6.128
Address: 10.42.6.128#53
Non-authoritative answer:
Name: goo.gl
Address: 64.233.165.138
Name: goo.gl
Address: 64.233.165.102
Name: goo.gl
Address: 64.233.165.100
Name: goo.gl
Address: 64.233.165.113
Name: goo.gl
Address: 64.233.165.101
Name: goo.gl
Address: 64.233.165.139
Name: goo.gl
Address: 2a00:1450:4010:c08::66
Name: goo.gl
Address: 2a00:1450:4010:c08::64
Name: goo.gl
Address: 2a00:1450:4010:c08::8a
Name: goo.gl
Address: 2a00:1450:4010:c08::71
I think, I've figured out the problem. It was combination of 2 factors:
server
node in datacenter 1 had EXTERNAL-IP configured. Other two agent nodes (iota
and theta
) had only INTERNAL-IP.dns
pod was running on agent node (iota
). My expectations were, that connectivity should be established only between any agent node and server node. And k3s
should setup VPN between all node through server node. Apparently, it requires each node to have public IP for this stack to work.
Another expectation was, that all system
pods would run on server node. Apparently this is not the case either.
Thank you @brandond , @manuelbuil for helping me sorting things out.
In this situation my only request would be to make documentation more clear about that, as I've spent quite some time, trying figuring out the problem.
And I didn't found any config option to move all kube-system
pods to server node - is it possible?
Great that you found the problem! Thanks for taking the effort
My expectations were, that connectivity should be established only between any agent node and server node. And
k3s
should setup VPN between all node through server node. Apparently, it requires each node to have public IP for this stack to work.
We can add mor information in the docs but right now it is stated that K3s uses wireguard to establish a VPN mesh for cluster traffic
. What you are describing would be a VPN star topology or hub-spoke, not a mesh
thank you, for clearing things out for me!
My expectations were, that connectivity should be established only between any agent node and server node. And k3s should setup VPN between all node through server node.
As Manuel (and the docs) said, wireguard is a full mesh. What you're asking for is closer to what tailscale does. If you want something more like a star/hub-and-spoke, you should look into using tailscale. This is covered in the docs.
Another expectation was, that all system pods would run on server node.
I'm curious where this expectation came from. There is nothing special about pods in the kube-system namespace, they will run on any available node in the cluster, same as any other pod.
In my setup, core
in is command-only node, were a tasks distributor is located. All other nodes are high-intense CPU usage nodes.
I've already run into a problem, when core
node was also a worker, and due to high-load k3s
was very slow to response to kubectl
commands.
That's why I don't want any of the system-important pods to run anywhere, but core.
I've manage to even move system-upgrade-controller
pod to core
with complex:
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- "true"
but I don't know, how to force this dns
pod to run on main node.
ping
the k3s server flannel IP form each agent wg show <INTERFACE>
to see if interfaces have communicated ip route show
to see if you have route for flannel , sometimes when the VPN (wg) is restarted but not k3s, then routes are gone and for recreate route
restart k3s-agents or and them manually tcpdump -qni any udp dst port 53
and check from agent nodes dig one.com @<YOUR_COREDNS_SVC_IP>
and see tcpdump
logs This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Discussed in https://github.com/k3s-io/k3s/discussions/10897