networkop / meshnet-cni

a (K8s) CNI plugin to create arbitrary virtual network topologies
BSD 3-Clause "New" or "Revised" License
116 stars 27 forks source link

only one side veth can come up #19

Closed hyson007 closed 4 years ago

hyson007 commented 4 years ago

hi,

Thanks for creating this, i'm new to this and trying to follow your post to lab ceos.

the issue i'm facing now is only one side of a veth can come up ( i have a master and two worker node, all are vm, cluster is using flannel for existing cni)

jack@ubuntu:~/meshnet-cni/tests$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

jack@ubuntu:~/meshnet-cni/tests$ k get nodes
NAME                   STATUS   ROLES    AGE   VERSION
kmaster.example.com    Ready    master   62m   v1.17.4
kworker1.example.com   Ready    <none>   56m   v1.17.4
kworker2.example.com   Ready    <none>   50m   v1.17.4

jack@ubuntu:~$ k get pod -n kube-system
NAME                                          READY   STATUS    RESTARTS   AGE
coredns-6955765f44-jcmzk                      1/1     Running   0          66m
coredns-6955765f44-sgdcj                      1/1     Running   0          66m
etcd-kmaster.example.com                      1/1     Running   0          66m
kube-apiserver-kmaster.example.com            1/1     Running   0          66m
kube-controller-manager-kmaster.example.com   1/1     Running   0          66m
kube-flannel-ds-amd64-4nxbj                   1/1     Running   0          66m
kube-flannel-ds-amd64-h4688                   1/1     Running   1          61m
kube-flannel-ds-amd64-z4jbg                   1/1     Running   1          55m
kube-proxy-gh6kv                              1/1     Running   0          55m
kube-proxy-pxctp                              1/1     Running   0          66m
kube-proxy-xf2pz                              1/1     Running   0          61m
kube-scheduler-kmaster.example.com            1/1     Running   0          66m

jack@ubuntu:~$ k get pod -n meshnet
NAME            READY   STATUS    RESTARTS   AGE
meshnet-9zhpz   1/1     Running   0          44m
meshnet-b7l9r   1/1     Running   0          44m
meshnet-xlng2   1/1     Running   0          44m

jack@ubuntu:~$ k get pod -o wide
NAME   READY   STATUS              RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
r1     1/1     Running             0          14m   10.244.1.128   kworker1.example.com   <none>           <none>
r2     0/1     ContainerCreating   0          14m   <none>         kworker2.example.com   <none>           <none>
[root@kworker1 ~]# cat /etc/cni/net.d/*
{
  "cniVersion": "0.2.0",
  "name": "meshnet_network",
  "type": "meshnet",
  "delegate": {
    "type": "flannel",
    "delegate": {
      "hairpinMode": true,
      "isDefaultGateway": true
    }
  }
}
{
  "cniVersion": "0.2.0",
  "name": "cbr0",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}
{
  "cniVersion": "0.2.0",
  "name": "meshnet_network",
  "type": "meshnet",
  "delegate": {
    "name": "dind0",
    "bridge": "dind0",
    "type": "bridge",
    "isDefaultGateway": true,
    "ipMasq": true,
    "ipam": {
      "type": "host-local",
      "subnet": "10.244.1.0/24",
      "gateway": "10.244.1.1"
    }
  }
}

describe non-working pod

Events:
  Type     Reason                  Age                     From                           Message
  ----     ------                  ----                    ----                           -------
  Normal   Scheduled               5m56s                   default-scheduler              Successfully assigned default/r2 to kworker2.example.com
  Warning  FailedCreatePodSandBox  5m54s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a4bf9de6d9998a7e03e0bab09fc3c31f2e822a76cc836e5ec23356d2a72c2812" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m50s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "896b47ff23220843a2add57c5b67cf8a76b2f63a403ef38b8b8329a4b0ba445d" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m47s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f836de51e920629873ceaf8b07cfcffc6dd96872020ab166233d33f86a510d68" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m44s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c7f84caaa299043f9b5c8f776614f0e1b8d027772b34ab79130bf759310e447d" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m40s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "973a1923112987391137d79583400963cf7159b32e0bb52a9e005ff0dfada274" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m37s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ff656f600bd75ad0b53e168c84bd3166beff9063201ff0c0f4cc6f08f363afe" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m33s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "0642829533fbd8811538ba051ad5d59db7ad8d9481ec6d8c9322b0a09da4a18c" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m30s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "36c95135ebd358611088027ce21237bdd3519efbbd233873adba21084b8d8a7a" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Warning  FailedCreatePodSandBox  5m27s                   kubelet, kworker2.example.com  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "91bb09251676b381e2814c21fb6eb64c88316be948dd4cab7cbbb879b366543c" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
  Normal   SandboxChanged          5m16s (x12 over 5m53s)  kubelet, kworker2.example.com  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  51s (x81 over 5m23s)    kubelet, kworker2.example.com  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b663f582c25b2b7fd30809243599d286f0a1ef6e14ac332cae7ec780e9b4d4bb" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
jack@ubuntu:~/meshnet-cni/tests$ k get pod -n kube-system
NAME                                          READY   STATUS    RESTARTS   AGE
coredns-6955765f44-jcmzk                      1/1     Running   0          53m
coredns-6955765f44-sgdcj                      1/1     Running   0          53m
etcd-kmaster.example.com                      1/1     Running   0          53m
kube-apiserver-kmaster.example.com            1/1     Running   0          53m
kube-controller-manager-kmaster.example.com   1/1     Running   0          53m
kube-flannel-ds-amd64-4nxbj                   1/1     Running   0          53m
kube-flannel-ds-amd64-h4688                   1/1     Running   1          48m
kube-flannel-ds-amd64-z4jbg                   1/1     Running   1          41m
kube-proxy-gh6kv                              1/1     Running   0          41m
kube-proxy-pxctp                              1/1     Running   0          53m
kube-proxy-xf2pz                              1/1     Running   0          48m
kube-scheduler-kmaster.example.com            1/1     Running   0          53m
jack@ubuntu:~/meshnet-cni/tests$ k get pod -n meshnet
NAME            READY   STATUS    RESTARTS   AGE
meshnet-9zhpz   1/1     Running   0          31m
meshnet-b7l9r   1/1     Running   0          31m
meshnet-xlng2   1/1     Running   0          31m

logs from kworker1

Mar 20 04:42:50 kworker1.example.com kubelet[6089]: 2020/03/20 04:42:50 Parsing cni .conf file
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: 2020/03/20 04:42:50 Parsing CNI_ARGS environment variable
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Processing ADD POD in namespace default
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Calling delegateAdd for flannel
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 About to delegate Add to flannel
Mar 20 04:42:50 kworker1.example.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 20 04:42:50 kworker1.example.com kernel: cni0: port 1(veth5a9d0475) entered blocking state
Mar 20 04:42:50 kworker1.example.com kernel: cni0: port 1(veth5a9d0475) entered disabled state
Mar 20 04:42:50 kworker1.example.com kernel: device veth5a9d0475 entered promiscuous mode
Mar 20 04:42:50 kworker1.example.com kernel: cni0: port 1(veth5a9d0475) entered blocking state
Mar 20 04:42:50 kworker1.example.com kernel: cni0: port 1(veth5a9d0475) entered forwarding state
Mar 20 04:42:50 kworker1.example.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 20 04:42:50 kworker1.example.com NetworkManager[4533]: <info>  [1584679370.9024] device (veth5a9d0475): carrier: link connected
Mar 20 04:42:50 kworker1.example.com NetworkManager[4533]: <info>  [1584679370.9026] manager: (veth5a9d0475): new Veth device (/org/freedesktop/NetworkManager/Devices/257)
Mar 20 04:42:50 kworker1.example.com NetworkManager[4533]: <info>  [1584679370.9032] device (cni0): carrier: link connected
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Master plugin has finished
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Master plugin result is IP4:{IP:{IP:10.244.1.128 Mask:ffffff00} Gateway:10.244.1.1 Routes:[{Dst:{IP:10.244.0.0 Mask:ffff0000} GW:10.244.1.1} {Dst:{IP:0.0.0.0 Mask:00000000} GW:10.244.1.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Looking up a default route to get the intf and IP for vxlan
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Default route is via 10.0.2.15@eth0
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Attempting to connect to local meshnet daemon
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Retrieving local pod information from meshnet daemon
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Setting pod alive status on meshnet daemon
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Starting to traverse all links
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Creating Veth struct with NetNS:/proc/6702/ns/net and intfName: eth1, IP:12.12.12.1/24
Mar 20 04:42:50 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:50 Retrieving peer pod r2 information from meshnet daemon
Mar 20 04:42:51 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:51 Is peer pod r2 alive?: false
Mar 20 04:42:51 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:51 Peer pod r2 isn't alive yet, continuing
Mar 20 04:42:51 kworker1.example.com kubelet[6089]: ADD| r1 |==> 2020/03/20 04:42:51 Connected all links, exiting with result IP4:{IP:{IP:10.244.1.128 Mask:ffffff00} Gateway:10.244.1.1 Routes:[{Dst:{IP:10.244.0.0 Mask:ffff0000} GW:10.244.1.1} {Dst:{IP:0.0.0.0 Mask:00000000} GW:10.244.1.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}
Mar 20 04:42:55 kworker1.example.com containerd[5573]: time="2020-03-20T04:42:55.382733489Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/56686fcac4ee4f9428e19071fd7ebc90fadcf57f5573d31f7d0e7c69c8fa02c4/shim.sock" debug=false pid=6868

logs from kworker2 (non-working side):

Mar 20 04:43:23 kworker2.example.com kubelet[6084]: 2020/03/20 04:43:23 Parsing cni .conf file
Mar 20 04:43:23 kworker2.example.com kubelet[6084]: 2020/03/20 04:43:23 Parsing CNI_ARGS environment variable
Mar 20 04:43:23 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:23 Processing ADD POD in namespace default
Mar 20 04:43:23 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:23 Calling delegateAdd for flannel
Mar 20 04:43:23 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:23 About to delegate Add to flannel
Mar 20 04:43:23 kworker2.example.com kubelet[6084]: W0320 04:43:23.994553    6084 pod_container_deletor.go:75] Container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1" not found in pod's containers
Mar 20 04:43:24 kworker2.example.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 20 04:43:24 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered blocking state
Mar 20 04:43:24 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered disabled state
Mar 20 04:43:24 kworker2.example.com kernel: device vethefc8bc7a entered promiscuous mode
Mar 20 04:43:24 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered blocking state
Mar 20 04:43:24 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered forwarding state
Mar 20 04:43:24 kworker2.example.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 20 04:43:24 kworker2.example.com NetworkManager[4518]: <info>  [1584679404.2519] device (vethefc8bc7a): carrier: link connected
Mar 20 04:43:24 kworker2.example.com NetworkManager[4518]: <info>  [1584679404.2521] manager: (vethefc8bc7a): new Veth device (/org/freedesktop/NetworkManager/Devices/145)
Mar 20 04:43:24 kworker2.example.com NetworkManager[4518]: <info>  [1584679404.2528] device (cni0): carrier: link connected
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Master plugin has finished
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Master plugin result is IP4:{IP:{IP:10.244.2.73 Mask:ffffff00} Gateway:10.244.2.1 Routes:[{Dst:{IP:10.244.0.0 Mask:ffff0000} GW:10.244.2.1} {Dst:{IP:0.0.0.0 Mask:00000000} GW:10.244.2.1}]}, DNS:{Nameservers:[] Domain: Search:[] Options:[]}
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Looking up a default route to get the intf and IP for vxlan
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Default route is via 10.0.2.15@eth0
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Attempting to connect to local meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving local pod information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Setting pod alive status on meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Starting to traverse all links
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Creating Veth struct with NetNS:/proc/24134/ns/net and intfName: eth1, IP:12.12.12.2/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving peer pod r1 information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Is peer pod r1 alive?: true
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Peer pod r1 is alive
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 r2 and r1 are on the same host
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Creating Veth struct with NetNS:/proc/6702/ns/net and intfName: eth1, IP:12.12.12.1/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Does the link already exist? Local:false, Peer:false
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Neither link exists. Checking if we've been skipped
Mar 20 04:43:24 kworker2.example.com kernel: IPv6: ADDRCONF(NETDEV_UP): koko4004429188: link is not ready
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Have we been skipped by our peer r1? &{true {} [] %!t(int32=0)}
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 DO we have a higher priority? true
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Peer POD has skipped us or we have a higher priority
Mar 20 04:43:24 kworker2.example.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 20 04:43:24 kworker2.example.com NetworkManager[4518]: <info>  [1584679404.4756] manager: (koko2028365937): new Veth device (/org/freedesktop/NetworkManager/Devices/146)
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Error when creating a new VEth pair with koko: failed to Statfs "/proc/6702/ns/net": no such file or directory
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 MY VETH STRUCT: (*api.VEth)(0xc00033c4e0)({
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: NsName: (string) (len=18) "/proc/24134/ns/net",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: LinkName: (string) (len=4) "eth1",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: IPAddr: ([]net.IPNet) (len=1 cap=1) {
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: (net.IPNet) 12.12.12.2/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: },
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: MirrorEgress: (string) "",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: MirrorIngress: (string) ""
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: })
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 PEER STRUCT: (*api.VEth)(0xc00033c600)({
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: NsName: (string) (len=17) "/proc/6702/ns/net",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: LinkName: (string) (len=4) "eth1",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: IPAddr: ([]net.IPNet) (len=1 cap=1) {
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: (net.IPNet) 12.12.12.1/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: },
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: MirrorEgress: (string) "",
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: MirrorIngress: (string) ""
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: })
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: E0320 04:43:24.511239    6084 cni.go:364] Error adding default_r2/6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1 to network meshnet/meshnet_network: failed to Statfs "/proc/6702/ns/net": no such file or directory
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Processing DEL request
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Retrieving pod's metadata from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Topology data still exists in CRs, cleaning up it's status
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Iterating over each link for clean-up
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Creating Veth struct with NetNS:/proc/24134/ns/net and intfName: eth1, IP:12.12.12.2/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Removing link eth1
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: time="2020-03-20T04:43:24Z" level=info msg="koko: remove veth link eth1"
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Error removing Veth link: failed to lookup "eth1" in "/proc/24134/ns/net": Link not found
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:24 Setting skip-reverse flag on peer r1
Mar 20 04:43:25 kworker2.example.com kubelet[6084]: DEL | r2 |==> 2020/03/20 04:43:25 Calling delegateDel for flannel
Mar 20 04:43:25 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered disabled state
Mar 20 04:43:25 kworker2.example.com kernel: device vethefc8bc7a left promiscuous mode
Mar 20 04:43:25 kworker2.example.com kernel: cni0: port 1(vethefc8bc7a) entered disabled state
Mar 20 04:43:25 kworker2.example.com NetworkManager[4518]: <info>  [1584679405.6315] device (vethefc8bc7a): released from master device cni0
Mar 20 04:43:25 kworker2.example.com containerd[5547]: time="2020-03-20T04:43:25.779911710Z" level=info msg="shim reaped" id=6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1
Mar 20 04:43:25 kworker2.example.com dockerd[5548]: time="2020-03-20T04:43:25.790773644Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 20 04:43:25 kworker2.example.com kubelet[6084]: E0320 04:43:25.878963    6084 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
Mar 20 04:43:25 kworker2.example.com kubelet[6084]: E0320 04:43:25.879003    6084 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "r2_default(b422aa7a-3959-40f2-8a61-62c2108f1811)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
Mar 20 04:43:25 kworker2.example.com kubelet[6084]: E0320 04:43:25.879015    6084 kuberuntime_manager.go:729] createPodSandbox for pod "r2_default(b422aa7a-3959-40f2-8a61-62c2108f1811)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1" network for pod "r2": networkPlugin cni failed to set up pod "r2_default" network: failed to Statfs "/proc/6702/ns/net": no such file or directory
Mar 20 04:43:25 kworker2.example.com kubelet[6084]: E0320 04:43:25.879049    6084 pod_workers.go:191] Error syncing pod b422aa7a-3959-40f2-8a61-62c2108f1811 ("r2_default(b422aa7a-3959-40f2-8a61-62c2108f1811)"), skipping: failed to "CreatePodSandbox" for "r2_default(b422aa7a-3959-40f2-8a61-62c2108f1811)" with CreatePodSandboxError: "CreatePodSandbox for pod \"r2_default(b422aa7a-3959-40f2-8a61-62c2108f1811)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1\" network for pod \"r2\": networkPlugin cni failed to set up pod \"r2_default\" network: failed to Statfs \"/proc/6702/ns/net\": no such file or directory"
Mar 20 04:43:26 kworker2.example.com kubelet[6084]: W0320 04:43:26.031197    6084 docker_sandbox.go:394] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "r2_default": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1"
Mar 20 04:43:26 kworker2.example.com kubelet[6084]: W0320 04:43:26.036593    6084 pod_container_deletor.go:75] Container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1" not found in pod's containers
Mar 20 04:43:26 kworker2.example.com kubelet[6084]: W0320 04:43:26.044131    6084 cni.go:331] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6df5a24cda367c04f46af4e9f2b07b7f3759b7aa44e35bc74ad6b71903626dd1"

Besides, I noticed a few seems inconsistent in readme documentation, it's mentioned to use "kubectl apply -f manifests/meshnet.yml"

it seems the path is incorrect, i loaded this one instead "kubectl apply -f manifests/base/meshnet.yml"

2nd, the tests/2node.yml, this seems consists more than 2 nodes, i removed the extra r3, ( i tried other topo as well and getting the same error though)

jack@ubuntu:~/meshnet-cni/tests$ cat 2node.yml
---
apiVersion: v1
kind: List
items:
- apiVersion: networkop.co.uk/v1beta1
  kind: Topology
  metadata:
    name: r1
  spec:
    links:
    - uid: 1
      peer_pod: r2
      local_intf: eth1
      local_ip: 12.12.12.1/24
      peer_intf: eth1
      peer_ip: 12.12.12.2/24
- apiVersion: networkop.co.uk/v1beta1
  kind: Topology
  metadata:
    name: r2
  spec:
    links:
    - uid: 1
      peer_pod: r1
      local_intf: eth1
      local_ip: 12.12.12.2/24
      peer_intf: eth1
      peer_ip: 12.12.12.1/24
- apiVersion: v1
  kind: Pod
  metadata:
    name: r1
  spec:
    containers:
    - image: alpine
      name: r1
      command:  ["/bin/sh", "-c", "sleep 2000000000000"]
- apiVersion: v1
  kind: Pod
  metadata:
    name: r2
  spec:
    containers:
    - image: alpine
      name: r2
      command:  ["/bin/sh", "-c", "sleep 2000000000000"]
hyson007 commented 4 years ago

just realized this is not an issue with veth itself, if i use nodeSelector to force them both go to one node, then veth can create successfully.

the issue seems to be more on kworker2 somehow think r1 and r2 are on same node but they are not.? (they should be using vxlan rather than veth)?

Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Looking up a default route to get the intf and IP for vxlan
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Default route is via 10.0.2.15@eth0
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Attempting to connect to local meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving local pod information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Setting pod alive status on meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Starting to traverse all links
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Creating Veth struct with NetNS:/proc/24134/ns/net and intfName: eth1, IP:12.12.12.2/24
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Retrieving peer pod r1 information from meshnet daemon
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Is peer pod r1 alive?: true
Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 Peer pod r1 is alive
**Mar 20 04:43:24 kworker2.example.com kubelet[6084]: ADD| r2 |==> 2020/03/20 04:43:24 r2 and r1 are on the same host**

jack@ubuntu:~$ k get pod -o wide
NAME   READY   STATUS              RESTARTS   AGE     IP             NODE                   NOMINATED NODE   READINESS GATES
r1     1/1     Running             0          3m12s   10.244.1.131   kworker1.example.com   <none>           <none>
r2     0/1     ContainerCreating   0          3m12s   <none>         kworker2.example.com   <none>           <none>
networkop commented 4 years ago

this looks strange. the logic to determine whether to use veth of vxlan is quite simple

  1. Each pod tries to find a local interface with default getways with getVxlanSource() function
  2. Each pod then compares its own srcIP with peerpod's srcIP and 2.1 if they are equal uses veth 2.2 if they are different uses vxlan

The only problem where I see this may go wrong is in getVxlanSource and looking at your outputs it looks like both kworker1 and kworker2 have the same IP assigned to eth0 - Default route is via 10.0.2.15@eth0. Do you have any idea why this happens? What are you using to create your cluster? Can you collect the output of ip addr and ip route from each of the kworkers?

hyson007 commented 4 years ago

thanks, that make sense.

I'm using a vagrant provision file to create the cluster, https://github.com/justmeandopensource/kubernetes/tree/master/vagrant-provisioning

somehow the eth0 on all three nodes, kmaster, kworker1, kworker2 are having exact same ip, seems this is related to vagrant/virtual box default ip assignment on eth0.

i have manually edit the eth0 ip (eth0 seems can't ping each other even after change to different ip) and added two host static route on kworker1/kworker2 to route via eth1 which seems at least get interface created.

BEFORE:

[root@kworker1 ~]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
       valid_lft 86026sec preferred_lft 86026sec
    inet6 fe80::5054:ff:fe8a:fee6/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:7c:d1:a5 brd ff:ff:ff:ff:ff:ff
    inet 172.42.42.101/24 brd 172.42.42.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe7c:d1a5/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:2a:7f:0d:4f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether f6:1c:72:b4:a0:4b brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::f41c:72ff:feb4:a04b/64 scope link
       valid_lft forever preferred_lft forever

[root@kworker2 ~]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
       valid_lft 86170sec preferred_lft 86170sec
    inet6 fe80::5054:ff:fe8a:fee6/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:1b:d9:22 brd ff:ff:ff:ff:ff:ff
    inet 172.42.42.102/24 brd 172.42.42.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe1b:d922/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:d2:68:ae:f9 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 42:08:b1:93:05:78 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::4008:b1ff:fe93:578/64 scope link
       valid_lft forever preferred_lft forever

AFTER

[root@kworker1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.16/24 brd 10.0.2.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe8a:fee6/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:7c:d1:a5 brd ff:ff:ff:ff:ff:ff
    inet 172.42.42.101/24 brd 172.42.42.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe7c:d1a5/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:2a:7f:0d:4f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether f6:1c:72:b4:a0:4b brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::f41c:72ff:feb4:a04b/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 66:0b:43:48:b0:a0 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::640b:43ff:fe48:b0a0/64 scope link
       valid_lft forever preferred_lft forever
7: vethf11f630e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether 8a:0a:4e:6e:1a:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::880a:4eff:fe6e:1ab2/64 scope link
       valid_lft forever preferred_lft forever

[root@kworker2 ~]# ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.17/24 brd 10.0.2.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe8a:fee6/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:1b:d9:22 brd ff:ff:ff:ff:ff:ff
    inet 172.42.42.102/24 brd 172.42.42.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe1b:d922/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:d2:68:ae:f9 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 42:08:b1:93:05:78 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::4008:b1ff:fe93:578/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 12:df:b8:bd:d9:81 brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::10df:b8ff:febd:d981/64 scope link
       valid_lft forever preferred_lft forever
475: vethf47a0bc8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether 16:67:ee:e2:18:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::1467:eeff:fee2:1815/64 scope link
       valid_lft forever preferred_lft forever

[root@kworker1 ~]# ip route
default via 10.0.2.2 dev eth0 proto static metric 100
10.0.0.0/8 via 172.42.42.102 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.16 metric 100
10.0.2.17 via 172.42.42.102 dev eth1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
12.0.0.0/8 via 172.42.42.102 dev eth1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.42.42.0/24 dev eth1 proto kernel scope link src 172.42.42.101 metric 101

[root@kworker2 ~]# ip route
default via 10.0.2.2 dev eth0 proto static metric 100
10.0.0.0/8 via 172.42.42.101 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.17 metric 100
10.0.2.16 via 172.42.42.101 dev eth1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1
12.0.0.0/8 via 172.42.42.101 dev eth1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.42.42.0/24 dev eth1 proto kernel scope link src 172.42.42.102 metric 101

however i'm still unable to get ping working between r1/r2 eth1.

jack@ubuntu:~/meshnet-cni/tests$ k get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
r1     1/1     Running   0          30m   10.244.1.2     kworker1.example.com   <none>           <none>
r2     1/1     Running   0          30m   10.244.2.236   kworker2.example.com   <none>           <none>

jack@ubuntu:~/meshnet-cni/tests$ k exec -it r1 sh
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 0E:5E:0B:82:CB:E0
          inet addr:10.244.1.2  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1264 (1.2 KiB)  TX bytes:42 (42.0 B)

eth1      Link encap:Ethernet  HWaddr FE:06:D3:1B:E5:6B
          inet addr:12.12.12.1  Bcast:12.12.12.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # ping 12.12.12.2
PING 12.12.12.2 (12.12.12.2): 56 data bytes
^C
--- 12.12.12.2 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

i did a tcpdump on kworker2 for the vxlan traffic, which still seems to be reachability issue somewhere, even though i can ping with each other with their eth0 ip


[root@kworker1 ~]# ping 10.0.2.17 -S 10.0.2.16
PING 10.0.2.17 (10.0.2.17) 56(84) bytes of data.
64 bytes from 10.0.2.17: icmp_seq=1 ttl=64 time=2.17 ms
64 bytes from 10.0.2.17: icmp_seq=2 ttl=64 time=1.15 ms

[root@kworker2 ~]# ping 10.0.2.16 -S 10.0.2.17
PING 10.0.2.16 (10.0.2.16) 56(84) bytes of data.
64 bytes from 10.0.2.16: icmp_seq=1 ttl=64 time=2.41 ms

[root@kworker2 ~]# sudo tcpdump -nnni any icmp -v
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10:27:11.622446 IP (tos 0xc0, ttl 64, id 56868, offset 0, flags [none], proto ICMP (1), length 106)
    10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
        IP (tos 0x0, ttl 64, id 669, offset 0, flags [none], proto UDP (17), length 78)
    10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.12.12.1 tell 12.12.12.2, length 28
10:27:11.622450 IP (tos 0xc0, ttl 64, id 56869, offset 0, flags [none], proto ICMP (1), length 106)
    10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
        IP (tos 0x0, ttl 64, id 913, offset 0, flags [none], proto UDP (17), length 78)
    10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 12.12.12.1 tell 12.12.12.2, length 28
10:27:11.622453 IP (tos 0xc0, ttl 64, id 56870, offset 0, flags [none], proto ICMP (1), length 106)
    10.0.2.17 > 10.0.2.17: ICMP host 10.0.2.16 unreachable, length 86
        IP (tos 0x0, ttl 64, id 1113, offset 0, flags [none], proto UDP (17), length 78)
    10.0.2.17.37698 > 10.0.2.16.4789: VXLAN, flags [I] (0x08), vni 5001

btw, i'm quite new to this, just wonder, do you suggest to test meshnet-cni and k8s-topo on kind (I follow your post which mentioned about dind, but that seems to be EOL, i couldn't get it working on the latest kind version, hence switched to this vagrant)

networkop commented 4 years ago

it looks like vagrant is trying to use qemu user networking (slirp) which doesn't have proper support for ICMP. My suggestion would be to try it with kind, which seems to be the default option now for all k8s testing and local development.

hyson007 commented 4 years ago

thanks much