networkop / meshnet-cni

a (K8s) CNI plugin to create arbitrary virtual network topologies
BSD 3-Clause "New" or "Revised" License
116 stars 28 forks source link

The POD connected by vxLAN cannot be pinged through #42

Open kkgty opened 2 years ago

kkgty commented 2 years ago

I have an OSPF topology with 10 nodes and they all run the same frrouting image 913a580f3a8bd6aac3e389076ee68a4 I want to test this topology with meshnet-cni

My k8s cluster has a total of 4 nodes connected through calico BGP mode

NAME     STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master   Ready    control-plane   7h49m   v1.24.0   192.168.22.1   <none>        Ubuntu 20.04.4 LTS   5.4.0-113-generic   containerd://1.6.4
node-d   Ready    <none>          7h47m   v1.24.0   192.168.22.3   <none>        Ubuntu 20.04.4 LTS   5.4.0-113-generic   containerd://1.6.4
node-i   Ready    <none>          7h45m   v1.24.0   192.168.22.6   <none>        Ubuntu 20.04.4 LTS   5.4.0-113-generic   containerd://1.6.4
node-k   Ready    <none>          7h46m   v1.24.0   192.168.22.7   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.4

I create my topology and distribute it on two nodes

NAME        READY   STATUS    RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
router-1    1/1     Running   0          11m   10.224.71.138   node-d   <none>           <none>
router-10   1/1     Running   0          11m   10.224.100.91   node-k   <none>           <none>
router-2    1/1     Running   0          11m   10.224.100.86   node-k   <none>           <none>
router-3    1/1     Running   0          11m   10.224.100.87   node-k   <none>           <none>
router-4    1/1     Running   0          11m   10.224.71.137   node-d   <none>           <none>
router-5    1/1     Running   0          11m   10.224.100.88   node-k   <none>           <none>
router-6    1/1     Running   0          11m   10.224.100.89   node-k   <none>           <none>
router-7    1/1     Running   0          11m   10.224.100.90   node-k   <none>           <none>
router-8    1/1     Running   0          11m   10.224.71.140   node-d   <none>           <none>
router-9    1/1     Running   0          11m   10.224.71.139   node-d   <none>           <none>

For example, Router4 should establish a neighbor relationship with R3, R5, and R7, but they are not established and cannot ping through each other

router-4# show ip ospf neighbor

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
10.224.100.87     1 Init/DROther    22m21s            38.205s 10.0.4.1        eth1:10.0.4.2                        0     0     0
10.224.100.88     1 Init/DROther    22m21s            38.093s 10.0.7.2        eth2:10.0.7.1                        0     0     0

router-4# ping 10.0.4.1
PING 10.0.4.1 (10.0.4.1): 56 data bytes
^C
--- 10.0.4.1 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

However, if all the pods are running on the same node and connected via veth, this is successful

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
10.224.15.148     1 Full/DR         3h40m52s          38.032s 10.0.4.1        eth1:10.0.4.2                        0     0     0
10.224.15.149     1 Full/DR         3h40m46s          33.848s 10.0.8.2        eth3:10.0.8.1                        0     0     0
10.224.219.86     1 Full/Backup     3h34m08s          31.814s 10.0.7.2        eth2:10.0.7.1                        0     0     0

Can someone help me?

networkop commented 2 years ago

Can you elaborate on how you create your topology? Do you have a list of instructions to reproduce this?

kkgty commented 2 years ago

thanks for your reply, here is my topology, and all configurations are here: https://github.com/kkgty/topo-ospf

apiVersion: v1
kind: List
items:
  ################### pod #######################
  ###### router-1 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-1
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-1
  ###### router-2 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-2
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-2
  ###### router-3 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-3
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-3
  ###### router-4 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-4
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-4
  ###### router-5 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-5
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-5
  ###### router-6 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-6
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-6
  ###### router-7 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-7
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-7
  ###### router-8 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-8
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-8
  ###### router-9 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-9
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-9
  ###### router-10 ######
  - apiVersion: v1
    kind: Pod
    metadata:
      name: router-10
      labels:
        name: tunnel
    spec:
      containers:
        - name: tunnel
          image: frrouting/frr:v8.2.2
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              mountPath: /etc/frr/daemons
              subPath: daemons
            - name: config
              mountPath: /etc/frr/frr.conf
              subPath: frr.conf
      volumes:
        - name: config
          configMap:
            name: router-10

  ################### topo #######################
  ###### router-1 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-1
    spec:
      links:
        - uid: 2218012
          peer_pod: router-2
          local_intf: eth1
          peer_intf: eth1
          local_ip: 10.0.1.1/24
          peer_ip: 10.0.1.2/24
        - uid: 2218013
          peer_pod: router-3
          local_intf: eth2
          peer_intf: eth1
          local_ip: 10.0.2.1/24
          peer_ip: 10.0.2.2/24
  ###### router-2 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-2
    spec:
      links:
        - uid: 2218012
          peer_pod: router-1
          local_intf: eth1
          peer_intf: eth1
          local_ip: 10.0.1.2/24
          peer_ip: 10.0.1.1/24
        - uid: 2218023
          peer_pod: router-3
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.3.1/24
          peer_ip: 10.0.3.2/24
  ###### router-3 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-3
    spec:
      links:
        - uid: 2218013
          peer_pod: router-1
          local_intf: eth1
          peer_intf: eth2
          local_ip: 10.0.2.2/24
          peer_ip: 10.0.2.1/24
        - uid: 2218023
          peer_pod: router-2
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.3.2/24
          peer_ip: 10.0.3.1/24
        - uid: 2218034
          peer_pod: router-4
          local_intf: eth3
          peer_intf: eth1
          local_ip: 10.0.4.1/24
          peer_ip: 10.0.4.2/24
        - uid: 2218035
          peer_pod: router-5
          local_intf: eth4
          peer_intf: eth1
          local_ip: 10.0.5.1/24
          peer_ip: 10.0.5.2/24
        - uid: 2218036
          peer_pod: router-6
          local_intf: eth5
          peer_intf: eth1
          local_ip: 10.0.6.1/24
          peer_ip: 10.0.6.2/24
  ###### router-4 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-4
    spec:
      links:
        - uid: 2218034
          peer_pod: router-3
          local_intf: eth1
          peer_intf: eth3
          local_ip: 10.0.4.2/24
          peer_ip: 10.0.4.1/24
        - uid: 2218045
          peer_pod: router-5
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.7.1/24
          peer_ip: 10.0.7.2/24
        - uid: 2218047
          peer_pod: router-7
          local_intf: eth3
          peer_intf: eth1
          local_ip: 10.0.8.1/24
          peer_ip: 10.0.8.2/24
  ###### router-5 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-5
    spec:
      links:
        - uid: 2218035
          peer_pod: router-3
          local_intf: eth1
          peer_intf: eth4
          local_ip: 10.0.5.2/24
          peer_ip: 10.0.5.1/24
        - uid: 2218045
          peer_pod: router-4
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.7.2/24
          peer_ip: 10.0.7.1/24
        - uid: 2218058
          peer_pod: router-8
          local_intf: eth3
          peer_intf: eth2
          local_ip: 10.0.10.1/24
          peer_ip: 10.0.10.2/24
  ###### router-6 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-6
    spec:
      links:
        - uid: 2218036
          peer_pod: router-3
          local_intf: eth1
          peer_intf: eth5
          local_ip: 10.0.6.2/24
          peer_ip: 10.0.6.1/24
        - uid: 2218068
          peer_pod: router-8
          local_intf: eth2
          peer_intf: eth1
          local_ip: 10.0.9.1/24
          peer_ip: 10.0.9.2/24
  ###### router-7 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-7
    spec:
      links:
        - uid: 2218047
          peer_pod: router-4
          local_intf: eth1
          peer_intf: eth3
          local_ip: 10.0.8.2/24
          peer_ip: 10.0.8.1/24
        - uid: 2218078
          peer_pod: router-8
          local_intf: eth2
          peer_intf: eth3
          local_ip: 10.0.11.1/24
          peer_ip: 10.0.11.2/24
  ###### router-8 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-8
    spec:
      links:
        - uid: 2218058
          peer_pod: router-5
          local_intf: eth2
          peer_intf: eth3
          local_ip: 10.0.10.2/24
          peer_ip: 10.0.10.1/24
        - uid: 2218068
          peer_pod: router-6
          local_intf: eth1
          peer_intf: eth2
          local_ip: 10.0.9.2/24
          peer_ip: 10.0.9.1/24
        - uid: 2218078
          peer_pod: router-7
          local_intf: eth3
          peer_intf: eth2
          local_ip: 10.0.11.2/24
          peer_ip: 10.0.11.1/24
        - uid: 2218089
          peer_pod: router-9
          local_intf: eth5
          peer_intf: eth1
          local_ip: 10.0.13.1/24
          peer_ip: 10.0.13.2/24
        - uid: 2218080
          peer_pod: router-10
          local_intf: eth4
          peer_intf: eth1
          local_ip: 10.0.12.1/24
          peer_ip: 10.0.12.2/24
  ###### router-9 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-9
    spec:
      links:
        - uid: 2218089
          peer_pod: router-8
          local_intf: eth1
          peer_intf: eth5
          local_ip: 10.0.13.2/24
          peer_ip: 10.0.13.1/24
        - uid: 2218090
          peer_pod: router-10
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.14.1/24
          peer_ip: 10.0.14.2/24
  ###### router-10 ######
  - apiVersion: networkop.co.uk/v1beta1
    kind: Topology
    metadata:
      name: router-10
    spec:
      links:
        - uid: 2218080
          peer_pod: router-8
          local_intf: eth1
          peer_intf: eth4
          local_ip: 10.0.12.2/24
          peer_ip: 10.0.12.1/24
        - uid: 2218090
          peer_pod: router-9
          local_intf: eth2
          peer_intf: eth2
          local_ip: 10.0.14.2/24
          peer_ip: 10.0.14.1/24
networkop commented 2 years ago

Can you also provide the output of ip -d link show from inside of one of the routers with neighbors in Init/DROther state? e.g. kubectl exec -it router-4 -- ip -d link show

kkgty commented 2 years ago

Sure, here's the output from router4

node-d ➜  ~ kubectl exec -n tunnel router-4 -- ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
3: eth0@if178: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether e6:a1:b0:66:3f:99 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
188: eth3@if188: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 66:d8:31:8b:88:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    vxlan id 2223047 remote 192.168.22.7 dev if2 srcport 0 0 dstport 4789 l2miss l3miss ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
189: eth1@if189: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 7e:ca:8c:a9:70:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    vxlan id 2223034 remote 192.168.22.7 dev if2 srcport 0 0 dstport 4789 l2miss l3miss ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
191: eth2@if191: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 86:1d:2c:1d:1d:dc brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    vxlan id 2223045 remote 192.168.22.7 dev if2 srcport 0 0 dstport 4789 l2miss l3miss ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

and router-5

node-d ➜  ~ kubectl exec -n tunnel router-5 -- ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
3: eth0@if194: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether e2:4b:59:44:b2:ac brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
205: eth1@if204: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether b2:be:cd:69:92:cd brd ff:ff:ff:ff:ff:ff link-netnsid 1 promiscuity 0 minmtu 68 maxmtu 65535
    veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
211: eth2@if211: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 56:89:62:6e:79:ab brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    vxlan id 2223045 remote 192.168.22.3 dev eth0 srcport 0 0 dstport 4789 l2miss l3miss ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
215: eth3@if215: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 86:fc:ef:8b:1f:3e brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
    vxlan id 2223058 remote 192.168.22.3 dev eth0 srcport 0 0 dstport 4789 l2miss l3miss ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

node-d ➜  ~ kubectl exec -n tunnel router-5 -- vtysh -c "show ip ospf neighbor"
Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
10.224.100.87     1 Full/Backup     2h18m15s          39.314s 10.0.5.1        eth1:10.0.5.2                        0     0     0
10.224.71.140     1 Full/Backup     2h18m20s          37.147s 10.0.10.2       eth3:10.0.10.1                       0     0     0
networkop commented 2 years ago

Everything seems fine. I even deployed your topology locally and was able to see all adjacencies established. The only issue you've got is router-4 missing ip ospf area statement under its eth3 interface. But otherwise, everything looks correct.

The Init/DROther is a really weird state. It's as if the packets are being dropped in one direction. I'd suggest checking if you have any ACLs on the host (or somewhere in the network) that might drop UDP/VXLAN packets between hosts. Check the host logs for any errors during Pod deployment. As an experiment you can try and create a pair of VXLAN interfaces manually, give them an IP and try pinging between them. Other than that, I don't have any other ideas.

kkgty commented 2 years ago

Thank you very much for your suggestions. I'll try to run some other tests.