networkop / k8s-topo

Topology builder for network simulations inside K8S
BSD 3-Clause "New" or "Revised" License
72 stars 18 forks source link

networkop/init-wait questions #13

Closed qlyoung closed 4 years ago

qlyoung commented 4 years ago
  1. networkop/init-wait is only available for amd64 so I built it myself: Dockerfile:
    FROM alpine
    COPY entrypoint.sh /
    ENTRYPOINT ["/entrypoint.sh"]

    entrypoint.sh (copied from your amd64 image):

    
    #!/bin/sh

INTFS=${1:-1} SLEEP=${2:-0}

int_calc () { index=0 for i in $(ls -1v /sys/class/net/ | grep 'eth|ens|eno'); do let index=index+1 done MYINT=$index }

int_calc

echo "Waiting for all $INTFS interfaces to be connected" while [ "$MYINT" -lt "$INTFS" ]; do echo "Connected $MYINT interfaces out of $INTFS" sleep 1 int_calc done

echo "Sleeping $SLEEP seconds before boot" sleep $SLEEP


Have I got this close enough? Or is there source for it somewhere that I missed?

It seems to be working sometimes, but not others:

root@clusterpi-69 ~# kubectl get pod NAME READY STATUS RESTARTS AGE k8s-topo-86cbbdbddb-5ks5m 1/1 Running 0 164m frr-192-0-2-1 0/1 Init:0/1 0 2m26s frr-192-0-2-7 0/1 Init:0/1 0 2m27s frr-192-0-2-3 0/1 Init:0/1 0 2m27s frr-192-0-2-9 0/1 Init:0/1 0 2m27s frr-192-0-2-4 0/1 Init:0/1 0 2m27s frr-192-0-2-5 0/1 Init:0/1 0 2m26s frr-192-0-2-8 1/1 Running 0 2m26s frr-192-0-2-0 1/1 Running 0 2m27s frr-192-0-2-6 1/1 Running 0 2m28s frr-192-0-2-2 1/1 Running 0 2m26s


E.g., `frr-192-0-2-1`:

root@clusterpi-69 /v/l/r/k/a/e/c/net.d# kubectl exec -it frr-192-0-2-1 --container init-frr-192-0-2-1 -- ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth0@if65: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP link/ether ea:5b:bf:92:f6:9c brd ff:ff:ff:ff:ff:ff 68: eth1@if67: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP link/ether c6:0e:85:41:be:4d brd ff:ff:ff:ff:ff:ff



Per the topology, this should have `eth1` and `eth2`, but only one has been created. I guess this is a `meshnet-cni` bug or a misconfig, though.
networkop commented 4 years ago

yeah, could be a meshnet-cni bug. does this happen on a smaller topology as well? you can use stern to view the logs from all meshnet pods to see what the issue is. i think the command is stern -n meshnet -l k8s-app=meshnet

qlyoung commented 4 years ago

Nah, I don't see it on ~6 node topologies. I'll install stern and get those logs

(btw, thanks for the 1.18 update, works like a charm)

qlyoung commented 4 years ago

Alright, got some logs to dump.

Generated random topology with builder 10 3:

/k8s-topo/examples/builder # cat random.yml 
conf_dir: /k8s-topo/examples/builder/config-random
etcd_port: 32379
links:
- endpoints:
  - frr-192-0-2-7:eth1:10.0.0.1/30
  - frr-192-0-2-9:eth1:10.0.0.2/30
- endpoints:
  - frr-192-0-2-3:eth1:10.0.0.6/30
  - frr-192-0-2-7:eth2:10.0.0.5/30
- endpoints:
  - frr-192-0-2-8:eth1:10.0.0.10/30
  - frr-192-0-2-9:eth2:10.0.0.9/30
- endpoints:
  - frr-192-0-2-3:eth2:10.0.0.13/30
  - frr-192-0-2-4:eth1:10.0.0.14/30
- endpoints:
  - frr-192-0-2-1:eth1:10.0.0.18/30
  - frr-192-0-2-4:eth2:10.0.0.17/30
- endpoints:
  - frr-192-0-2-6:eth1:10.0.0.22/30
  - frr-192-0-2-8:eth2:10.0.0.21/30
- endpoints:
  - frr-192-0-2-1:eth2:10.0.0.25/30
  - frr-192-0-2-2:eth1:10.0.0.26/30
- endpoints:
  - frr-192-0-2-0:eth1:10.0.0.30/30
  - frr-192-0-2-2:eth2:10.0.0.29/30
- endpoints:
  - frr-192-0-2-2:eth3:10.0.0.33/30
  - frr-192-0-2-5:eth1:10.0.0.34/30
- endpoints:
  - frr-192-0-2-2:eth4:10.0.0.37/30
  - frr-192-0-2-6:eth2:10.0.0.38/30
- endpoints:
  - frr-192-0-2-4:eth3:10.0.0.41/30
  - frr-192-0-2-9:eth3:10.0.0.42/30
- endpoints:
  - frr-192-0-2-7:eth3:10.0.0.46/30
  - frr-192-0-2-9:eth4:10.0.0.45/30
publish_base:
  22: 30001

Applying with k8s-topo --create random.yml yields:

# kubectl get all
NAME                            READY   STATUS     RESTARTS   AGE
pod/k8s-topo-86cbbdbddb-5ks5m   1/1     Running    1          27h
pod/frr-192-0-2-6               0/1     Init:0/1   0          49s
pod/frr-192-0-2-8               0/1     Init:0/1   0          50s
pod/frr-192-0-2-2               0/1     Init:0/1   0          49s
pod/frr-192-0-2-7               0/1     Init:0/1   0          50s
pod/frr-192-0-2-3               1/1     Running    0          50s
pod/frr-192-0-2-1               1/1     Running    0          50s
pod/frr-192-0-2-9               0/1     Init:0/1   0          50s
pod/frr-192-0-2-0               1/1     Running    0          49s
pod/frr-192-0-2-4               1/1     Running    0          50s
pod/frr-192-0-2-5               1/1     Running    0          49s

Note that the command hangs, like so:

INFO:__main__:All topology data has been uploaded
INFO:__main__:All pods have been created successfully
INFO:__main__:
 alias frr-192-0-2-7='kubectl exec -it frr-192-0-2-7 sh'
 alias frr-192-0-2-9='kubectl exec -it frr-192-0-2-9 sh'
 alias frr-192-0-2-3='kubectl exec -it frr-192-0-2-3 sh'
 alias frr-192-0-2-8='kubectl exec -it frr-192-0-2-8 sh'
 alias frr-192-0-2-4='kubectl exec -it frr-192-0-2-4 sh'
 alias frr-192-0-2-1='kubectl exec -it frr-192-0-2-1 sh'
 alias frr-192-0-2-6='kubectl exec -it frr-192-0-2-6 sh'
 alias frr-192-0-2-2='kubectl exec -it frr-192-0-2-2 sh'
 alias frr-192-0-2-0='kubectl exec -it frr-192-0-2-0 sh'
 alias frr-192-0-2-5='kubectl exec -it frr-192-0-2-5 sh'
<hangs here>

Logs attached for brevity. Going off the meshnet-cni readme, I used stern meshnet -n meshnet > logs.txt. logs.txt

networkop commented 4 years ago

The issue occurs when meshnetd tries to update a local interface of a pod. First, it gets a message from a remote node's CNI and builds a veth struct

meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Created koko Veth struct {NsName:/var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 LinkName:eth3 IPAddr:[{IP:10.0.0.33 Mask:fffffffc}] MirrorEgress: MirrorIngress:}"

then it tries to read interface attributes from netlink and gets nil and then failes to update the link with remote node's IP and vni

meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Created koko Veth struct {NsName:/var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 LinkName:eth3 IPAddr:[{IP:10.0.0.33 Mask:fffffffc}] MirrorEgress: MirrorIngress:}"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Retrieved eth3 link from /var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 Netns: <nil>"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Is link <nil> a VXLAN?: false"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Link <nil> we've found isn't a vxlan or doesn't exist"

So i think the issue is in getLinkFromNS function, specifically

    // If namespace doesn't exist, do nothing and return empty result
    if vethNs, err = ns.GetNS(nsName); err != nil {
        return
    }

I remember there was a PR recently that had something to do with alternative CRIs: https://github.com/networkop/meshnet-cni/pull/17 Could this be the issue you're seeing? I've only merged it recently so maybe the version you're using can be updated? Also, there was a comment in the PR about mounting /run, which wasn't part of the PR itself. So maybe it's worth adding that to the meshnet manifest as well?

qlyoung commented 4 years ago

Thanks for the analysis.

Could this be the issue you're seeing? I've only merged it recently so maybe the version you're using can be updated?

No; I pulled your 1.18 changes and rebuilt, and those were after that PR was merged, so that patch is in my images. Reading that PR it looks like wiithout it meshnet would not have worked at all in my environment - which the PR is indeed applicable to, this is k3s with containerd.

Incidentally meshnet-cni on k3s requires some other patches to support the cascading CNI model meshnet uses; it only reads /etc/cni/net.d if you disable its default CNI, flannel, in which case meshnet-cni has no master plugin to delegate to; but if you leave flannel enabled, the path it uses is /var/lib/rancher/k3s/agent/etc/cni/net.d, so I had to patch that in. I am not sure how I could make the project handle this other than some build switch that tweaks the path; and even that will not be sufficient, because the CNI bin directory is located at a similar path, but with a random guid in it.

I'll try mounting host /run into the meshnet pods and see what happens.

networkop commented 4 years ago

I think handling arbitrary CNI dir should be fairly easy. The entrypoint script can see how kubelet was started and if --cni-conf-dir was passed, then use that instead of /etc/cni/net.d

Let me know if you have any luck with /run. If you're still having problems, can you send me the list of steps to reproduce this locally?

qlyoung commented 4 years ago

Nope, no luck. Check me:

root@clusterpi-69 /h/p/meshnet-cni# kubectl -n meshnet describe pod/meshnet-xsm4j
Name:         meshnet-xsm4j
Namespace:    meshnet
Priority:     0
Node:         clusterpi-67/192.168.0.229
Start Time:   Fri, 05 Jun 2020 18:06:14 +0100
Labels:       app=meshnet
              controller-revision-hash=6d4cb8df7d
              name=meshnet
              pod-template-generation=1
Annotations:  <none>
Status:       Running
IP:           192.168.0.229
IPs:
  IP:           192.168.0.229
Controlled By:  DaemonSet/meshnet
Containers:
  meshnet:
    Container ID:   containerd://2bb5c852fc0c45878f2ae591e44cb507338bf79b4648c9c4f76a7b7d7f2c5ba4
    Image:          qlyoung/meshnet:latest
    Image ID:       docker.io/qlyoung/meshnet@sha256:0c49cc930512cb77532044cc5ff0340f6a4054ae66bb3f7fe829b55c2dcfbf1e
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 05 Jun 2020 18:06:27 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  200Mi
    Requests:
      cpu:        100m
      memory:     200Mi
    Environment:  <none>
    Mounts:
      /etc/cni/net.d from cni-cfg (rw)
      /opt/cni/bin from cni-bin (rw)
      /run from run (rw)
      /var/run/netns from var-run-netns (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from meshnet-token-7zsvb (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  cni-bin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/k3s/data/ec54df8c1938fe49660230d16334b4c7e83888a93e6f037fd8552893e2f67383/bin
    HostPathType:  
  cni-cfg:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/k3s/agent/etc/cni/net.d
    HostPathType:  
  var-run-netns:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/netns
    HostPathType:  
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run
    HostPathType:

Unfortunately I think it'll be quite difficult for you to repro, as I'm using a real hardware cluster with k3s, you would have to set it up like me and apply my patches. Maybe there's something wrong with those?

meshnet-cni patches:

diff --git a/.mk/kustomize.mk b/.mk/kustomize.mk
index 734bbf0..41ca18e 100644
--- a/.mk/kustomize.mk
+++ b/.mk/kustomize.mk
@@ -14,8 +14,8 @@ kust-ensure:
 .PHONY: kustomize
 kustomize: kust-ensure 
    @cd manifests/base/ && $(GOPATH)/kustomize edit set image $(DOCKERID)/meshnet:$(VERSION)
-   kubectl apply -k manifests/base/
+   kubectl apply --kubeconfig /etc/rancher/k3s/k3s.yaml -k manifests/base/

 .PHONY: kustomize-kops
 kustomize-kops: kust-ensure 
-   kubectl apply -k manifests/overlays/kops/ 
\ No newline at end of file
+   kubectl apply -k manifests/overlays/kops/ 
diff --git a/docker/Dockerfile b/docker/Dockerfile
index 1d52402..d12fc50 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -1,8 +1,8 @@
 FROM golang:1.12.7 AS proto_base
 ENV GO111MODULE=off
 RUN apt-get update && apt-get -y install curl unzip
-RUN curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v3.9.1/protoc-3.9.1-linux-x86_64.zip && \
-    unzip protoc-3.9.1-linux-x86_64.zip
+RUN curl -LO https://github.com/asteris-llc/protocol-buffers-arm/releases/download/3.1.0-binary-1/protoc-3.1.0-linux-arm.zip && \
+    unzip protoc-3.1.0-linux-arm.zip && cp lib/*.so.11 /lib
 RUN go get -u github.com/golang/protobuf/protoc-gen-go
 COPY daemon/ daemon/
 COPY Makefile .
diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh
index 439ae3a..7035074 100644
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -1,5 +1,19 @@
 #!/bin/sh

+# cat << EOF > /meshnet.conf
+# {
+#   "cniVersion": "0.2.0",
+#   "name": "meshnet_network",
+#   "type": "meshnet",
+#   "delegate": {
+#     "type": "flannel",
+#     "forceAddress": true,
+#     "hairpinMode": true,
+#     "isDefaultGateway": true
+#   }
+# }
+# EOF
+
 echo "Distributing files"
 if [ -d "/opt/cni/bin/" ] && [ -f "./meshnet" ]; then
   cp ./meshnet /opt/cni/bin/
@@ -25,5 +39,6 @@ fi
 echo 'Making sure the name is set for the master plugin'
 jq '.delegate.name = "masterplugin"' /etc/cni/net.d/00-meshnet.conf > /tmp/cni.conf && mv /tmp/cni.conf /etc/cni/net.d/00-meshnet.conf  

+
 echo "Starting meshnetd daemon"
 /meshnetd
diff --git a/etc/cni/net.d/meshnet.conf b/etc/cni/net.d/meshnet.conf
index 5febbef..560d750 100644
--- a/etc/cni/net.d/meshnet.conf
+++ b/etc/cni/net.d/meshnet.conf
@@ -3,15 +3,9 @@
   "name": "meshnet_network",
   "type": "meshnet",
   "delegate": {
-    "name": "dind0",
-    "bridge": "dind0",
-    "type": "bridge",
-    "isDefaultGateway": true,
-    "ipMasq": true,
-    "ipam": {
-      "type": "host-local",
-      "subnet": "10.244.1.0/24",
-      "gateway": "10.244.1.1"
-    }
+    "type": "flannel",
+    "forceAddress": true,
+    "hairpinMode": true,
+    "isDefaultGateway": true
   }
 }
diff --git a/kustomize b/kustomize
deleted file mode 100755
index 064ad12..0000000
Binary files a/kustomize and /dev/null differ
diff --git a/kustomize_v3.5.4_linux_amd64.tar.gz b/kustomize_v3.5.4_linux_amd64.tar.gz
deleted file mode 100644
index e1ae391..0000000
Binary files a/kustomize_v3.5.4_linux_amd64.tar.gz and /dev/null differ
diff --git a/manifests/base/kustomization.yaml b/manifests/base/kustomization.yaml
index 96a054d..3a354c4 100644
--- a/manifests/base/kustomization.yaml
+++ b/manifests/base/kustomization.yaml
@@ -5,5 +5,7 @@ commonLabels:
 images:
 - name: networkop/meshnet
   newTag: latest
+- name: qlyoung/meshnet:armhf
+  newTag: latest
 resources:
 - meshnet.yml
diff --git a/manifests/base/meshnet.yml b/manifests/base/meshnet.yml
index 4e1a16d..0af773d 100644
--- a/manifests/base/meshnet.yml
+++ b/manifests/base/meshnet.yml
@@ -121,8 +121,8 @@ items:
         hostPID: true
         hostIPC: true
         serviceAccountName: meshnet
-        nodeSelector:
-          beta.kubernetes.io/arch: amd64
+        # nodeSelector:
+        #   beta.kubernetes.io/arch: amd64
         tolerations:
         - operator: Exists
           effect: NoSchedule
@@ -130,8 +130,8 @@ items:
         - name: meshnet
           securityContext:
             privileged: true
-          image: networkop/meshnet:latest
-          imagePullPolicy: IfNotPresent
+          image: qlyoung/meshnet:armhf
+          imagePullPolicy: Always
           resources:
             limits:
               memory: 200Mi
@@ -143,6 +143,8 @@ items:
             mountPath: /etc/cni/net.d
           - name: cni-bin
             mountPath: /opt/cni/bin
+          - name: run
+            mountPath: /run
           - name: var-run-netns
             mountPath: /var/run/netns
             mountPropagation: Bidirectional
@@ -150,11 +152,14 @@ items:
         volumes:
         - name: cni-bin
           hostPath:
-            path: /opt/cni/bin
+            path: /var/lib/rancher/k3s/data/ec54df8c1938fe49660230d16334b4c7e83888a93e6f037fd8552893e2f67383/bin
         - name: cni-cfg
           hostPath:
-            path: /etc/cni/net.d
+            path: /var/lib/rancher/k3s/agent/etc/cni/net.d
         - name: var-run-netns
           hostPath:
             path: /var/run/netns
+        - name: run
+          hostPath:
+            path: /run

diff --git a/tests/3node.yml b/tests/3node.yml
index 3de93b7..e414180 100644
--- a/tests/3node.yml
+++ b/tests/3node.yml
@@ -66,7 +66,7 @@ items:
     containers: 
     - image: alpine
       name: pod
-      command:  ["/bin/sh", "-c", "sleep 2000000000000"]
+      command:  ["/bin/sh", "-c", "sleep 20000000"]
 - apiVersion: v1
   kind: Pod
   metadata:
@@ -77,7 +77,7 @@ items:
     containers: 
     - image: alpine
       name: pod
-      command:  ["/bin/sh", "-c", "sleep 2000000000000"]
+      command:  ["/bin/sh", "-c", "sleep 20000000"]
 - apiVersion: v1
   kind: Pod
   metadata:
@@ -88,4 +88,4 @@ items:
     containers: 
     - image: alpine
       name: pod
-      command:  ["/bin/sh", "-c", "sleep 2000000000000"]
\ No newline at end of file
+      command:  ["/bin/sh", "-c", "sleep 20000000"]

k8s-topo patches:

diff --git a/Dockerfile b/Dockerfile
index 63f5d7e..64c7582 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -19,9 +19,9 @@ COPY web/nginx.conf /etc/nginx/conf.d/default.conf

 RUN mkdir -p /run/nginx

-RUN mkdir /lib64 && ln -s /lib/libc.musl-x86_64.so.1 /lib64/ld-linux-x86-64.so.2
+# RUN mkdir /lib64 && ln -s /lib/libc.musl-x86_64.so.1 /lib64/ld-linux-x86-64.so.2

-RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl
+RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/arm/kubectl
 RUN chmod +x kubectl

 ENV PATH="/k8s-topo:/k8s-topo/bin:${PATH}"
diff --git a/bin/k8s-topo b/bin/k8s-topo
index b1c3acd..3d60d06 100755
--- a/bin/k8s-topo
+++ b/bin/k8s-topo
@@ -122,6 +122,8 @@ def parse_endpoints(devices, endpoints, link, idx):
             device = devices.get(device_name, Host(device_name))
         elif "qrtr" in device_name.lower():
             device = devices.get(device_name, Quagga(device_name))
+        elif "frr" in device_name.lower():
+            device = devices.get(device_name, FRR(device_name))
         elif "xrv" in device_name.lower():
             device = devices.get(device_name, XRV(device_name))
         elif "vmx" in device_name.lower():
@@ -449,8 +451,8 @@ class Device(object):

             # nsc-sidecar container
             nsc_container = client.V1Container(name="nsc-sidecar")
-            nsc_container.image = "networkservicemesh/topology-sidecar-nsc:master"
-            nsc_container.image_pull_policy = "IfNotPresent"
+            nsc_container.image = "qlyoung/topology-sidecar-nsc:armhf"
+            nsc_container.image_pull_policy = "Always"

             env_nsm_io = client.V1EnvVar(
                 name="NS_NETWORKSERVICEMESH_IO", value=",".join(self._build_nsurls())
@@ -464,8 +466,8 @@ class Device(object):

             # nse-sidecar container
             nse_container = client.V1Container(name="nse-sidecar")
-            nse_container.image = "networkservicemesh/topology-sidecar-nse:master"
-            nse_container.image_pull_policy = "IfNotPresent"
+            nse_container.image = "qlyoung/topology-sidecar-nse:armhf"
+            nse_container.image_pull_policy = "Always"
             nse_container.resources = resource_requirements

             env_nse_name = client.V1EnvVar(name="ENDPOINT_NETWORK_SERVICE", value=self.topo)
@@ -516,11 +518,11 @@ class Device(object):
             container.command = self.command
             container.args = self.args
             container.env = self.environment
-            container.image_pull_policy = "IfNotPresent"
+            container.image_pull_policy = "Always"

             init_container = client.V1Container(name=f"init-{self.name}")
-            init_container.image = "networkop/init-wait:latest"
-            init_container.image_pull_policy = "IfNotPresent"
+            init_container.image = "qlyoung/init-wait:armhf"
+            init_container.image_pull_policy = "Always"
             init_container.args = [f"{len(self.interfaces)+1}", f"{self.sleep}"]

             # Setting resource requests
@@ -645,10 +647,16 @@ class Host(Device):
 class Quagga(Device):
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
-        self.image = "networkop/qrtr"
+        self.image = "qlyoung/frr:armhf"
         self.conf_path = "/etc/quagga"
         self.startup_file = "Quagga.conf"

+class FRR(Device):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.image = "qlyoung/frr:armhf"
+        self.conf_path = "/etc/frr"
+        self.startup_file = "frr.conf"

 class XRV(Device):
     def __init__(self, *args, **kwargs):
diff --git a/manifest.yml b/manifest.yml
index f34db64..4d620c2 100644
--- a/manifest.yml
+++ b/manifest.yml
@@ -85,8 +85,8 @@ items:
         serviceAccountName: k8s-topo
         hostNetwork: true
         containers:
-        - image: networkop/k8s-topo:0.2.0
+        - image: qlyoung/k8s-topo:armhf
           imagePullPolicy: Always
           name: k8s-topo
           ports:
-          - containerPort: 80
\ No newline at end of file
+          - containerPort: 80

BTW, pinged you on CNCF slack, maybe we can chat there?

qlyoung commented 4 years ago

Here's a much smaller repro (turns out it does reproduce with minimal topologies). Note that I can destroy and recreate this repeatedly, and sometimes it will work, sometimes it will not. Whether or not the pods are scheduled on the same node seems to have no bearing on whether it works.

I am almost wondering if there is a race condition somewhere because these nodes are raspis and therefore quite slow?

random.yml:

conf_dir: /k8s-topo/examples/builder/config-random
etcd_port: 32379
links:
- endpoints:
  - frr-192-0-2-7:eth1:10.0.0.1/30
  - frr-192-0-2-9:eth2:10.0.0.2/30

Meshnet logs when applying this with k8s-topo:

meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-9's metadata from K8s..."
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Setting frr-192-0-2-9's SrcIp=192.168.0.230 and NetNs=/var/run/netns/cni-29f8fcc8-1b01-2429-1794-e3806fee1a77"                                                       
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Skipping of pod frr-192-0-2-7 by pod frr-192-0-2-9"                                                                                                                  
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Setting frr-192-0-2-7's SrcIp=192.168.0.234 and NetNs=/var/run/netns/cni-22853c58-7996-8e69-3db2-2c823bafc23b"                                                       
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Checking if frr-192-0-2-7 is skipped by frr-192-0-2-7"                                                                                                               
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Retrieving frr-192-0-2-9's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Setting frr-192-0-2-9's SrcIp= and NetNs="
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reverse-skipping of pod frr-192-0-2-7 by pod frr-192-0-2-9"                                                                                                          
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Updating peer skipped list"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Update pod status frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="THIS SKIPPED:" thisSkipped="[frr-192-0-2-7]"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="NEW THIS SKIPPED:" newThisSkipped="[]"

Pod status:

NAME                            READY   STATUS     RESTARTS   AGE
pod/k8s-topo-86cbbdbddb-5ks5m   1/1     Running    1          42h
pod/frr-192-0-2-9               0/1     Init:0/1   0          77s
pod/frr-192-0-2-7               0/1     Init:0/1   0          77s

The two are on different nodes. Here's a look at pod/frr-192-0-2-9, running on clusterpi-42:

kubectl exec -it frr-192-0-2-9 -c init-frr-192-0-2-9 ip link
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.                                                                                               
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 3a:8a:3a:34:34:5c brd ff:ff:ff:ff:ff:ff
root@clusterpi-42:/home/pi# ip netns exec cni-29f8fcc8-1b01-2429-1794-e3806fee1a77 ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 3a:8a:3a:34:34:5c brd ff:ff:ff:ff:ff:ff link-netnsid 0

Hopefully this is a bit more digestible than the large topo :)

networkop commented 4 years ago

thanks. can you share how you're building your cluster?

qlyoung commented 4 years ago

It is 4 raspberry pis each running Raspbian 10 (kernel 4.19). k3s was installed with the following playbook:

- hosts:    
  - all    
  gather_facts: False    
  vars:    
    random_number: "{{ 100 | random }}"    
  tasks:    
  - name: set hostname    
    hostname:    
      name: "clusterpi-{{ random_number }}"    
    become: true    
  - name: reboot pi    
    become: true    
    reboot:    
  - name: update system    
    apt:    
      name: "*"    
      state: latest    
      update_cache: yes    
    become: true    
  - name: download k3s    
    get_url:    
      url: https://get.k3s.io    
      dest: /tmp/getk3s.sh    
- hosts:    
  - master    
  tasks:    
  - name: install k3s for master    
    command: sh /tmp/getk3s.sh    
    become: true    
  - name: retrieve k8s join key    
    slurp:    
      src: "/var/lib/rancher/k3s/server/node-token"    
    register: slurped_user_data    
    become: true    
  - name: Decode data and store as fact    
    set_fact:    
      master_key: "{{ slurped_user_data.content | b64decode }}"    
- hosts:    
  - slaves    
  tasks:    
  - name: install k3s for slave    
    command: sh /tmp/getk3s.sh    
    environment:    
      K3S_URL: "https://{{ k3s_master }}:6443"    
      K3S_TOKEN: "{{ hostvars['%s' | format(k3s_master) ]['master_key'] }}"    
    become: true    

I.e. a totally vanilla k3s cluster. I then disabled k3s's built-in load balancer, traefik, because it binds all the http ports.

meshnet-cni was installed by applying the patches above for ARM support and to change the CNI config paths for k3s, then rebuilding all the images directly on one of the pis and pushing them to my dockerhub registry. I updated all the image references to point at this new registry, deployed meshnet with make install. Applied the above patches for k8s-topo and did the same.

qlyoung commented 4 years ago

Tried switching the container runtime to Docker, though I don't see how that could influence it. No change in behavior. Also should note the default backend for flannel on k3s is vxlan, which I've kept.

networkop commented 4 years ago

cool, i'll try to reproduce this no a k3d cluster at home. btw, i've recovered my password to cncf slack, so you can ping me there as well.

networkop commented 4 years ago

I've tested with k3d and couldn't reproduce the issue. I've pushed a k3d-test branch to meshnet-cni repo with some extra make targets to help build the environment. I've done a few dozens tests with k8s-topo using different topologies but got 100% success rate.

Also, I've had another look at the logs and I think my initial analysis of the problem was wrong. When the code gets to the Link <nil> we've found isn't a vxlan or doesn't exist, it means that the vxlan interface doesn't exist and will be created. it's a totally normal situation when a pod doesn't create its side of the vxlan link because it doesn't yet know the destination IP of its peer.

There's another place where logs can be collected - host OS itself. Since it's the job of a kubelet to invoke a CNI plugin, this will be done by the process running in the host OS (e.g. kubelet or k3s agent), so the logs from the meshnet CNI plugin will be written to standard logging destination (e.g. /var/log/messages or journald). Would you be able to collect those logs for the failed scenario?

qlyoung commented 4 years ago

Also, I've had another look at the logs and I think my initial analysis of the problem was wrong. When the code gets to the Link we've found isn't a vxlan or doesn't exist, it means that the vxlan interface doesn't exist and will be created. it's a totally normal situation when a pod doesn't create its side of the vxlan link because it doesn't yet know the destination IP of its peer.

Yeah, that was my assessment as well; when I see those messages that typically indicates things are going well since it's creating the vxlan netdevices.

I've been trying to debug by having a look at the namespaces being created to see what is missing but the correlations between meshnetd logs, the netns's that should be created and what devices I should be seeing in them aren't clear in my head yet. It's confusing because most of the time the pods that get stuck on init-wait get at least 1 or 2 of the interfaces they're waiting on created but are missing the rest of them, so the namespace is getting created and the code to create interfaces is functioning but for some reason isn't making all of them. It definitely seems that there are some missing logs because I'm not seeing anything really out of the ordinary in the logs I've given you; if you have some ideas for where we could add some extra logs to get insight I can give that a shot.

Question though - are you testing on a single physical host, and if so, have you tested on a hardware cluster with multiple physical hosts before?

I'm a bit tempted to just blow this whole cluster away and set it up from scratch to see if the problem magically disappears :joy: I think I'll spend some more time debugging before going nuclear though.

I'll get those host logs for you for a node that has a stuck pod on it.

qlyoung commented 4 years ago

Anyone following the saga - fix is here https://github.com/networkop/k8s-topo/commit/0e258b8574571732ed2aa5c154230de35b247078