Closed hyson007 closed 4 years ago
just realized this is not an issue with veth itself, if i use nodeSelector to force them both go to one node, then veth can create successfully.
the issue seems to be more on kworker2 somehow think r1 and r2 are on same node but they are not.? (they should be using vxlan rather than veth)?
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Looking up a default route to get the intf and IP for vxlan
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Default route is via 10.0.2.15@eth0
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Attempting to connect to local meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving local pod information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Setting pod alive status on meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Starting to traverse all links
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Creating Veth struct with NetNS:/proc/24134/ns/net and intfName: eth1, IP:12.12.12.2/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving peer pod r1 information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Is peer pod r1 alive?: true
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Peer pod r1 is alive
**Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 r2 and r1 are on the same host**
jack@ubuntu:~$ k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
r1 1/1 Running 0 3m12s 10.244.1.131 kworker1.example.com <none> <none>
r2 0/1 ContainerCreating 0 3m12s <none> kworker2.example.com <none> <none>
this looks strange. the logic to determine whether to use veth of vxlan is quite simple
getVxlanSource()
functionThe only problem where I see this may go wrong is in getVxlanSource
and looking at your outputs it looks like both kworker1 and kworker2 have the same IP assigned to eth0
- Default route is via 10.0.2.15@eth0
. Do you have any idea why this happens? What are you using to create your cluster? Can you collect the output of ip addr
and ip route
from each of the kworkers?
thanks, that make sense.
I'm using a vagrant provision file to create the cluster, https://github.com/justmeandopensource/kubernetes/tree/master/vagrant-provisioning
somehow the eth0 on all three nodes, kmaster, kworker1, kworker2 are having exact same ip, seems this is related to vagrant/virtual box default ip assignment on eth0.
i have manually edit the eth0 ip (eth0 seems can't ping each other even after change to different ip) and added two host static route on kworker1/kworker2 to route via eth1 which seems at least get interface created.
BEFORE:
[root@kworker1 ~]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
valid_lft 86026sec preferred_lft 86026sec
inet6 fe80::5054:ff:fe8a:fee6/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:7c:d1:a5 brd ff:ff:ff:ff:ff:ff
inet 172.42.42.101/24 brd 172.42.42.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe7c:d1a5/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:2a:7f:0d:4f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether f6:1c:72:b4:a0:4b brd ff:ff:ff:ff:ff:ff
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::f41c:72ff:feb4:a04b/64 scope link
valid_lft forever preferred_lft forever
[root@kworker2 ~]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
valid_lft 86170sec preferred_lft 86170sec
inet6 fe80::5054:ff:fe8a:fee6/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:1b:d9:22 brd ff:ff:ff:ff:ff:ff
inet 172.42.42.102/24 brd 172.42.42.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe1b:d922/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:d2:68:ae:f9 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 42:08:b1:93:05:78 brd ff:ff:ff:ff:ff:ff
inet 10.244.2.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::4008:b1ff:fe93:578/64 scope link
valid_lft forever preferred_lft forever
AFTER
[root@kworker1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.16/24 brd 10.0.2.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe8a:fee6/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:7c:d1:a5 brd ff:ff:ff:ff:ff:ff
inet 172.42.42.101/24 brd 172.42.42.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe7c:d1a5/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:2a:7f:0d:4f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether f6:1c:72:b4:a0:4b brd ff:ff:ff:ff:ff:ff
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::f41c:72ff:feb4:a04b/64 scope link
valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether 66:0b:43:48:b0:a0 brd ff:ff:ff:ff:ff:ff
inet 10.244.1.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::640b:43ff:fe48:b0a0/64 scope link
valid_lft forever preferred_lft forever
7: vethf11f630e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 8a:0a:4e:6e:1a:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::880a:4eff:fe6e:1ab2/64 scope link
valid_lft forever preferred_lft forever
[root@kworker2 ~]# ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.17/24 brd 10.0.2.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe8a:fee6/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:1b:d9:22 brd ff:ff:ff:ff:ff:ff
inet 172.42.42.102/24 brd 172.42.42.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe1b:d922/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:d2:68:ae:f9 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 42:08:b1:93:05:78 brd ff:ff:ff:ff:ff:ff
inet 10.244.2.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::4008:b1ff:fe93:578/64 scope link
valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether 12:df:b8:bd:d9:81 brd ff:ff:ff:ff:ff:ff
inet 10.244.2.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::10df:b8ff:febd:d981/64 scope link
valid_lft forever preferred_lft forever
475: vethf47a0bc8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 16:67:ee:e2:18:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::1467:eeff:fee2:1815/64 scope link
valid_lft forever preferred_lft forever
[root@kworker1 ~]# ip route
default via 10.0.2.2 dev eth0 proto static metric 100
10.0.0.0/8 via 172.42.42.102 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.16 metric 100
10.0.2.17 via 172.42.42.102 dev eth1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
12.0.0.0/8 via 172.42.42.102 dev eth1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.42.42.0/24 dev eth1 proto kernel scope link src 172.42.42.101 metric 101
[root@kworker2 ~]# ip route
default via 10.0.2.2 dev eth0 proto static metric 100
10.0.0.0/8 via 172.42.42.101 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.17 metric 100
10.0.2.16 via 172.42.42.101 dev eth1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1
12.0.0.0/8 via 172.42.42.101 dev eth1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.42.42.0/24 dev eth1 proto kernel scope link src 172.42.42.102 metric 101
however i'm still unable to get ping working between r1/r2 eth1.
jack@ubuntu:~/meshnet-cni/tests$ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
r1 1/1 Running 0 30m 10.244.1.2 kworker1.example.com <none> <none>
r2 1/1 Running 0 30m 10.244.2.236 kworker2.example.com <none> <none>
jack@ubuntu:~/meshnet-cni/tests$ k exec -it r1 sh
/ # ifconfig
eth0 Link encap:Ethernet HWaddr 0E:5E:0B:82:CB:E0
inet addr:10.244.1.2 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:16 errors:0 dropped:0 overruns:0 frame:0
TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1264 (1.2 KiB) TX bytes:42 (42.0 B)
eth1 Link encap:Ethernet HWaddr FE:06:D3:1B:E5:6B
inet addr:12.12.12.1 Bcast:12.12.12.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
/ # ping 12.12.12.2
PING 12.12.12.2 (12.12.12.2): 56 data bytes
^C
--- 12.12.12.2 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
i did a tcpdump on kworker2 for the vxlan traffic, which still seems to be reachability issue somewhere, even though i can ping with each other with their eth0 ip
[root@kworker1 ~]# ping 10.0.2.17 -S 10.0.2.16
PING 10.0.2.17 (10.0.2.17) 56(84) bytes of data.
64 bytes from 10.0.2.17: icmp_seq=1 ttl=64 time=2.17 ms
64 bytes from 10.0.2.17: icmp_seq=2 ttl=64 time=1.15 ms
[root@kworker2 ~]# ping 10.0.2.16 -S 10.0.2.17
PING 10.0.2.16 (10.0.2.16) 56(84) bytes of data.
64 bytes from 10.0.2.16: icmp_seq=1 ttl=64 time=2.41 ms
[root@kworker2 ~]# sudo tcpdump -nnni any icmp -v
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10:27:11.622446 IP (tos 0xc0, ttl 64, id 56868, offset 0, flags [none], proto ICMP (1), length 106)
10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
IP (tos 0x0, ttl 64, id 669, offset 0, flags [none], proto UDP (17), length 78)
10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.12.12.1 tell 12.12.12.2, length 28
10:27:11.622450 IP (tos 0xc0, ttl 64, id 56869, offset 0, flags [none], proto ICMP (1), length 106)
10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
IP (tos 0x0, ttl 64, id 913, offset 0, flags [none], proto UDP (17), length 78)
10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.12.12.1 tell 12.12.12.2, length 28
10:27:11.622453 IP (tos 0xc0, ttl 64, id 56870, offset 0, flags [none], proto ICMP (1), length 106)
10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
IP (tos 0x0, ttl 64, id 1113, offset 0, flags [none], proto UDP (17), length 78)
10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001
btw, i'm quite new to this, just wonder, do you suggest to test meshnet-cni and k8s-topo on kind (I follow your post which mentioned about dind, but that seems to be EOL, i couldn't get it working on the latest kind version, hence switched to this vagrant)
it looks like vagrant is trying to use qemu user networking (slirp) which doesn't have proper support for ICMP. My suggestion would be to try it with kind, which seems to be the default option now for all k8s testing and local development.
thanks much
hi,
Thanks for creating this, i'm new to this and trying to follow your post to lab ceos.
the issue i'm facing now is only one side of a veth can come up ( i have a master and two worker node, all are vm, cluster is using flannel for existing cni)
describe non-working pod
logs from kworker1
logs from kworker2 (non-working side):
Besides, I noticed a few seems inconsistent in readme documentation, it's mentioned to use "kubectl apply -f manifests/meshnet.yml"
it seems the path is incorrect, i loaded this one instead "kubectl apply -f manifests/base/meshnet.yml"
2nd, the tests/2node.yml, this seems consists more than 2 nodes, i removed the extra r3, ( i tried other topo as well and getting the same error though)