Open vojbarzz opened 2 weeks ago
Are you connecting via DNS names or IP addresses? Can you maybe post the output of ssh -vvv ...
from before and after, to see what might be the problem? I can only speculate that kube-proxy, kube-router & friends on the worker are configuring something™ that disrupts the connectivity. Does pod-to-pod traffic across nodes work? /cc @juanluisvaladas, as he's the networking guru :upside_down_face:
DNS resolution works fine. Also all works fine until I will add host to a cluster as a node:
Hi @vojbarzz, I'm guessing this is probably something that happens specifically in OVS because I've never seen this before.
If it's possible, I'd like the following information:
1- I understand you can connect to other hosts in the same subnet because DNS is fine. Correct me if this isn't true
2- If you can connect to other hosts in the same subnet, can you please gather the output of a traceroute to the public host? traceroute 54.36.127.120
(If you don't have traceroute it can be an equivalent like tracepath, mtr, etc..
3- I'd like to see the output of an iptables trace, to do so iptables -t raw -A PREROUTING -d 54.36.127.120 -j TRACE
, try to ping it. Then acquire the trace output with dmesg
(just the iptables traces from a couple packets should be enough, maybe get the last 20 or 30 lines) and the output of iptables-save -c
4- The output of ip route.
1/ Unfortunately I'm having only two hosts for workers in different zones/datacenter 2/
root@fra2 ~# traceroute gra1.my-devbox.cloud
traceroute to gra1.my-devbox.cloud (54.36.127.120), 30 hops max, 60 byte packets
1 135.125.188.252 (135.125.188.252) 0.559 ms 0.555 ms 0.643 ms
2 10.17.245.82 (10.17.245.82) 0.547 ms 0.521 ms 10.17.245.80 (10.17.245.80) 0.479 ms
3 10.73.40.110 (10.73.40.110) 0.153 ms 0.209 ms 10.73.40.68 (10.73.40.68) 0.325 ms
4 10.73.40.195 (10.73.40.195) 0.170 ms 10.73.40.29 (10.73.40.29) 0.181 ms 10.73.40.97 (10.73.40.97) 0.149 ms
5 * * *
6 10.73.1.192 (10.73.1.192) 13.337 ms 10.95.34.34 (10.95.34.34) 13.464 ms 10.95.34.16 (10.95.34.16) 13.486 ms
7 10.73.1.21 (10.73.1.21) 13.579 ms 10.73.2.175 (10.73.2.175) 13.680 ms 10.73.0.41 (10.73.0.41) 11.654 ms
8 10.17.155.49 (10.17.155.49) 14.409 ms 10.17.145.9 (10.17.145.9) 14.458 ms 10.17.155.53 (10.17.155.53) 14.231 ms
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
root@fra2 ~# dmesg | tail
[27547.598600] kube-bridge: port 8(veth9617b46b) entered disabled state
[27547.599123] veth9617b46b (unregistering): left allmulticast mode
[27547.599127] veth9617b46b (unregistering): left promiscuous mode
[27547.599130] kube-bridge: port 8(veth9617b46b) entered disabled state
[28550.244750] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.244757] kube-bridge: port 8(veth51a7fa2b) entered disabled state
[28550.244767] veth51a7fa2b: entered allmulticast mode
[28550.244818] veth51a7fa2b: entered promiscuous mode
[28550.247125] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.247130] kube-bridge: port 8(veth51a7fa2b) entered forwarding state
root@fra2 ~# ping -c 3 54.36.127.120
PING 54.36.127.120 (54.36.127.120) 56(84) bytes of data.
64 bytes from 54.36.127.120: icmp_seq=1 ttl=56 time=13.4 ms
64 bytes from 54.36.127.120: icmp_seq=2 ttl=56 time=13.4 ms
64 bytes from 54.36.127.120: icmp_seq=3 ttl=56 time=13.4 ms
--- 54.36.127.120 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 13.352/13.379/13.403/0.021 ms
root@fra2 ~# dmesg | tail
[27547.598600] kube-bridge: port 8(veth9617b46b) entered disabled state
[27547.599123] veth9617b46b (unregistering): left allmulticast mode
[27547.599127] veth9617b46b (unregistering): left promiscuous mode
[27547.599130] kube-bridge: port 8(veth9617b46b) entered disabled state
[28550.244750] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.244757] kube-bridge: port 8(veth51a7fa2b) entered disabled state
[28550.244767] veth51a7fa2b: entered allmulticast mode
[28550.244818] veth51a7fa2b: entered promiscuous mode
[28550.247125] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.247130] kube-bridge: port 8(veth51a7fa2b) entered forwarding state
root@fra2 ~# iptables -t raw -L -v -n
Chain PREROUTING (policy ACCEPT 647K packets, 190M bytes)
pkts bytes target prot opt in out source destination
0 0 TRACE 0 -- * * 0.0.0.0/0 54.36.127.120
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
3/ here is an output of iptables.txt
4/
default via 135.125.188.254 dev enp1s0f0 proto dhcp src 135.125.188.239 metric 100
10.244.0.0/24 dev kube-bridge proto kernel scope link src 10.244.0.1
10.244.2.0/24 dev tun-8a72a84b43c proto 17 src 135.125.188.239
135.125.188.0/24 dev enp1s0f0 proto kernel scope link src 135.125.188.239 metric 100
135.125.188.254 dev enp1s0f0 proto dhcp scope link src 135.125.188.239 metric 100
213.186.33.99 via 135.125.188.254 dev enp1s0f0 proto dhcp src 135.125.188.239 metric 100
Just to be sure that the issue is clear:
If I add node 54.36.127.120 I can ping it from another node using ssh or kubectl debug node/fra2 -it --image nicolaka/netshoot -- bash
. But not able to ping it from a regular pod directly kubectl exec -it network-debugger -- bash
. Before the node 54.36.127.120 was added I was able to ping from network-debugger
pod. If I remove the node using kubectl delete node gra1
I'm able to ping it again from pod network-debugger
.
Hi @vojbarzz we discussed this in today's call and I don't think we understand exactly what's the issue.
Let's say you have two nodes in your network. Can node A reach node B? Can node A reach a pod running in node B? Can a pod running in node A reach a pod in node B? Can a pod running in node A reach a pod node B? Can node A reach an external ip address like github.com? (this was confirmed in the last answer. Can a pod running in node A reach this external IP?
iptables trace is not producting any output into dmesg (trying before adding a host into the cluster)
This happens because you're using nf_tables, if once we understand the problem we determine we need this information it's available running xtables-monitor -t
.
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.1+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
I'm not able to reach any public service (ping, ssh, .....) to host if added as a worker
Steps to reproduce
k0sctl.yaml
### Expected behavior I can reach from any pod on any node another node public services ### Actual behavior I'm not able to reach public services on server if added as worker to cluster ### Screenshots and logs _No response_ ### Additional context _No response_