Closed qlyoung closed 4 years ago
yeah, could be a meshnet-cni bug. does this happen on a smaller topology as well?
you can use stern to view the logs from all meshnet pods to see what the issue is. i think the command is stern -n meshnet -l k8s-app=meshnet
Nah, I don't see it on ~6 node topologies. I'll install stern and get those logs
(btw, thanks for the 1.18 update, works like a charm)
Alright, got some logs to dump.
Generated random topology with builder 10 3
:
/k8s-topo/examples/builder # cat random.yml
conf_dir: /k8s-topo/examples/builder/config-random
etcd_port: 32379
links:
- endpoints:
- frr-192-0-2-7:eth1:10.0.0.1/30
- frr-192-0-2-9:eth1:10.0.0.2/30
- endpoints:
- frr-192-0-2-3:eth1:10.0.0.6/30
- frr-192-0-2-7:eth2:10.0.0.5/30
- endpoints:
- frr-192-0-2-8:eth1:10.0.0.10/30
- frr-192-0-2-9:eth2:10.0.0.9/30
- endpoints:
- frr-192-0-2-3:eth2:10.0.0.13/30
- frr-192-0-2-4:eth1:10.0.0.14/30
- endpoints:
- frr-192-0-2-1:eth1:10.0.0.18/30
- frr-192-0-2-4:eth2:10.0.0.17/30
- endpoints:
- frr-192-0-2-6:eth1:10.0.0.22/30
- frr-192-0-2-8:eth2:10.0.0.21/30
- endpoints:
- frr-192-0-2-1:eth2:10.0.0.25/30
- frr-192-0-2-2:eth1:10.0.0.26/30
- endpoints:
- frr-192-0-2-0:eth1:10.0.0.30/30
- frr-192-0-2-2:eth2:10.0.0.29/30
- endpoints:
- frr-192-0-2-2:eth3:10.0.0.33/30
- frr-192-0-2-5:eth1:10.0.0.34/30
- endpoints:
- frr-192-0-2-2:eth4:10.0.0.37/30
- frr-192-0-2-6:eth2:10.0.0.38/30
- endpoints:
- frr-192-0-2-4:eth3:10.0.0.41/30
- frr-192-0-2-9:eth3:10.0.0.42/30
- endpoints:
- frr-192-0-2-7:eth3:10.0.0.46/30
- frr-192-0-2-9:eth4:10.0.0.45/30
publish_base:
22: 30001
Applying with k8s-topo --create random.yml
yields:
# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/k8s-topo-86cbbdbddb-5ks5m 1/1 Running 1 27h
pod/frr-192-0-2-6 0/1 Init:0/1 0 49s
pod/frr-192-0-2-8 0/1 Init:0/1 0 50s
pod/frr-192-0-2-2 0/1 Init:0/1 0 49s
pod/frr-192-0-2-7 0/1 Init:0/1 0 50s
pod/frr-192-0-2-3 1/1 Running 0 50s
pod/frr-192-0-2-1 1/1 Running 0 50s
pod/frr-192-0-2-9 0/1 Init:0/1 0 50s
pod/frr-192-0-2-0 1/1 Running 0 49s
pod/frr-192-0-2-4 1/1 Running 0 50s
pod/frr-192-0-2-5 1/1 Running 0 49s
Note that the command hangs, like so:
INFO:__main__:All topology data has been uploaded
INFO:__main__:All pods have been created successfully
INFO:__main__:
alias frr-192-0-2-7='kubectl exec -it frr-192-0-2-7 sh'
alias frr-192-0-2-9='kubectl exec -it frr-192-0-2-9 sh'
alias frr-192-0-2-3='kubectl exec -it frr-192-0-2-3 sh'
alias frr-192-0-2-8='kubectl exec -it frr-192-0-2-8 sh'
alias frr-192-0-2-4='kubectl exec -it frr-192-0-2-4 sh'
alias frr-192-0-2-1='kubectl exec -it frr-192-0-2-1 sh'
alias frr-192-0-2-6='kubectl exec -it frr-192-0-2-6 sh'
alias frr-192-0-2-2='kubectl exec -it frr-192-0-2-2 sh'
alias frr-192-0-2-0='kubectl exec -it frr-192-0-2-0 sh'
alias frr-192-0-2-5='kubectl exec -it frr-192-0-2-5 sh'
<hangs here>
Logs attached for brevity. Going off the meshnet-cni
readme, I used stern meshnet -n meshnet > logs.txt
.
logs.txt
The issue occurs when meshnetd tries to update a local interface of a pod. First, it gets a message from a remote node's CNI and builds a veth struct
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Created koko Veth struct {NsName:/var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 LinkName:eth3 IPAddr:[{IP:10.0.0.33 Mask:fffffffc}] MirrorEgress: MirrorIngress:}"
then it tries to read interface attributes from netlink and gets nil
and then failes to update the link with remote node's IP and vni
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Created koko Veth struct {NsName:/var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 LinkName:eth3 IPAddr:[{IP:10.0.0.33 Mask:fffffffc}] MirrorEgress: MirrorIngress:}"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Retrieved eth3 link from /var/run/netns/cni-977a5b39-45ea-97c4-d978-083127332d49 Netns: <nil>"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Is link <nil> a VXLAN?: false"
meshnet-r46xl meshnet time="2020-06-05T05:51:38Z" level=info msg="Link <nil> we've found isn't a vxlan or doesn't exist"
So i think the issue is in getLinkFromNS
function, specifically
// If namespace doesn't exist, do nothing and return empty result
if vethNs, err = ns.GetNS(nsName); err != nil {
return
}
I remember there was a PR recently that had something to do with alternative CRIs:
https://github.com/networkop/meshnet-cni/pull/17
Could this be the issue you're seeing? I've only merged it recently so maybe the version you're using can be updated?
Also, there was a comment in the PR about mounting /run
, which wasn't part of the PR itself. So maybe it's worth adding that to the meshnet manifest as well?
Thanks for the analysis.
Could this be the issue you're seeing? I've only merged it recently so maybe the version you're using can be updated?
No; I pulled your 1.18 changes and rebuilt, and those were after that PR was merged, so that patch is in my images. Reading that PR it looks like wiithout it meshnet would not have worked at all in my environment - which the PR is indeed applicable to, this is k3s with containerd.
Incidentally meshnet-cni on k3s requires some other patches to support the cascading CNI model meshnet uses; it only reads /etc/cni/net.d
if you disable its default CNI, flannel, in which case meshnet-cni
has no master plugin to delegate to; but if you leave flannel enabled, the path it uses is /var/lib/rancher/k3s/agent/etc/cni/net.d
, so I had to patch that in. I am not sure how I could make the project handle this other than some build switch that tweaks the path; and even that will not be sufficient, because the CNI bin
directory is located at a similar path, but with a random guid in it.
I'll try mounting host /run
into the meshnet pods and see what happens.
I think handling arbitrary CNI dir should be fairly easy. The entrypoint script can see how kubelet was started and if --cni-conf-dir
was passed, then use that instead of /etc/cni/net.d
Let me know if you have any luck with /run
. If you're still having problems, can you send me the list of steps to reproduce this locally?
Nope, no luck. Check me:
root@clusterpi-69 /h/p/meshnet-cni# kubectl -n meshnet describe pod/meshnet-xsm4j
Name: meshnet-xsm4j
Namespace: meshnet
Priority: 0
Node: clusterpi-67/192.168.0.229
Start Time: Fri, 05 Jun 2020 18:06:14 +0100
Labels: app=meshnet
controller-revision-hash=6d4cb8df7d
name=meshnet
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.0.229
IPs:
IP: 192.168.0.229
Controlled By: DaemonSet/meshnet
Containers:
meshnet:
Container ID: containerd://2bb5c852fc0c45878f2ae591e44cb507338bf79b4648c9c4f76a7b7d7f2c5ba4
Image: qlyoung/meshnet:latest
Image ID: docker.io/qlyoung/meshnet@sha256:0c49cc930512cb77532044cc5ff0340f6a4054ae66bb3f7fe829b55c2dcfbf1e
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 05 Jun 2020 18:06:27 +0100
Ready: True
Restart Count: 0
Limits:
memory: 200Mi
Requests:
cpu: 100m
memory: 200Mi
Environment: <none>
Mounts:
/etc/cni/net.d from cni-cfg (rw)
/opt/cni/bin from cni-bin (rw)
/run from run (rw)
/var/run/netns from var-run-netns (rw)
/var/run/secrets/kubernetes.io/serviceaccount from meshnet-token-7zsvb (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /var/lib/rancher/k3s/data/ec54df8c1938fe49660230d16334b4c7e83888a93e6f037fd8552893e2f67383/bin
HostPathType:
cni-cfg:
Type: HostPath (bare host directory volume)
Path: /var/lib/rancher/k3s/agent/etc/cni/net.d
HostPathType:
var-run-netns:
Type: HostPath (bare host directory volume)
Path: /var/run/netns
HostPathType:
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
Unfortunately I think it'll be quite difficult for you to repro, as I'm using a real hardware cluster with k3s, you would have to set it up like me and apply my patches. Maybe there's something wrong with those?
meshnet-cni
patches:
diff --git a/.mk/kustomize.mk b/.mk/kustomize.mk
index 734bbf0..41ca18e 100644
--- a/.mk/kustomize.mk
+++ b/.mk/kustomize.mk
@@ -14,8 +14,8 @@ kust-ensure:
.PHONY: kustomize
kustomize: kust-ensure
@cd manifests/base/ && $(GOPATH)/kustomize edit set image $(DOCKERID)/meshnet:$(VERSION)
- kubectl apply -k manifests/base/
+ kubectl apply --kubeconfig /etc/rancher/k3s/k3s.yaml -k manifests/base/
.PHONY: kustomize-kops
kustomize-kops: kust-ensure
- kubectl apply -k manifests/overlays/kops/
\ No newline at end of file
+ kubectl apply -k manifests/overlays/kops/
diff --git a/docker/Dockerfile b/docker/Dockerfile
index 1d52402..d12fc50 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -1,8 +1,8 @@
FROM golang:1.12.7 AS proto_base
ENV GO111MODULE=off
RUN apt-get update && apt-get -y install curl unzip
-RUN curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v3.9.1/protoc-3.9.1-linux-x86_64.zip && \
- unzip protoc-3.9.1-linux-x86_64.zip
+RUN curl -LO https://github.com/asteris-llc/protocol-buffers-arm/releases/download/3.1.0-binary-1/protoc-3.1.0-linux-arm.zip && \
+ unzip protoc-3.1.0-linux-arm.zip && cp lib/*.so.11 /lib
RUN go get -u github.com/golang/protobuf/protoc-gen-go
COPY daemon/ daemon/
COPY Makefile .
diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh
index 439ae3a..7035074 100644
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -1,5 +1,19 @@
#!/bin/sh
+# cat << EOF > /meshnet.conf
+# {
+# "cniVersion": "0.2.0",
+# "name": "meshnet_network",
+# "type": "meshnet",
+# "delegate": {
+# "type": "flannel",
+# "forceAddress": true,
+# "hairpinMode": true,
+# "isDefaultGateway": true
+# }
+# }
+# EOF
+
echo "Distributing files"
if [ -d "/opt/cni/bin/" ] && [ -f "./meshnet" ]; then
cp ./meshnet /opt/cni/bin/
@@ -25,5 +39,6 @@ fi
echo 'Making sure the name is set for the master plugin'
jq '.delegate.name = "masterplugin"' /etc/cni/net.d/00-meshnet.conf > /tmp/cni.conf && mv /tmp/cni.conf /etc/cni/net.d/00-meshnet.conf
+
echo "Starting meshnetd daemon"
/meshnetd
diff --git a/etc/cni/net.d/meshnet.conf b/etc/cni/net.d/meshnet.conf
index 5febbef..560d750 100644
--- a/etc/cni/net.d/meshnet.conf
+++ b/etc/cni/net.d/meshnet.conf
@@ -3,15 +3,9 @@
"name": "meshnet_network",
"type": "meshnet",
"delegate": {
- "name": "dind0",
- "bridge": "dind0",
- "type": "bridge",
- "isDefaultGateway": true,
- "ipMasq": true,
- "ipam": {
- "type": "host-local",
- "subnet": "10.244.1.0/24",
- "gateway": "10.244.1.1"
- }
+ "type": "flannel",
+ "forceAddress": true,
+ "hairpinMode": true,
+ "isDefaultGateway": true
}
}
diff --git a/kustomize b/kustomize
deleted file mode 100755
index 064ad12..0000000
Binary files a/kustomize and /dev/null differ
diff --git a/kustomize_v3.5.4_linux_amd64.tar.gz b/kustomize_v3.5.4_linux_amd64.tar.gz
deleted file mode 100644
index e1ae391..0000000
Binary files a/kustomize_v3.5.4_linux_amd64.tar.gz and /dev/null differ
diff --git a/manifests/base/kustomization.yaml b/manifests/base/kustomization.yaml
index 96a054d..3a354c4 100644
--- a/manifests/base/kustomization.yaml
+++ b/manifests/base/kustomization.yaml
@@ -5,5 +5,7 @@ commonLabels:
images:
- name: networkop/meshnet
newTag: latest
+- name: qlyoung/meshnet:armhf
+ newTag: latest
resources:
- meshnet.yml
diff --git a/manifests/base/meshnet.yml b/manifests/base/meshnet.yml
index 4e1a16d..0af773d 100644
--- a/manifests/base/meshnet.yml
+++ b/manifests/base/meshnet.yml
@@ -121,8 +121,8 @@ items:
hostPID: true
hostIPC: true
serviceAccountName: meshnet
- nodeSelector:
- beta.kubernetes.io/arch: amd64
+ # nodeSelector:
+ # beta.kubernetes.io/arch: amd64
tolerations:
- operator: Exists
effect: NoSchedule
@@ -130,8 +130,8 @@ items:
- name: meshnet
securityContext:
privileged: true
- image: networkop/meshnet:latest
- imagePullPolicy: IfNotPresent
+ image: qlyoung/meshnet:armhf
+ imagePullPolicy: Always
resources:
limits:
memory: 200Mi
@@ -143,6 +143,8 @@ items:
mountPath: /etc/cni/net.d
- name: cni-bin
mountPath: /opt/cni/bin
+ - name: run
+ mountPath: /run
- name: var-run-netns
mountPath: /var/run/netns
mountPropagation: Bidirectional
@@ -150,11 +152,14 @@ items:
volumes:
- name: cni-bin
hostPath:
- path: /opt/cni/bin
+ path: /var/lib/rancher/k3s/data/ec54df8c1938fe49660230d16334b4c7e83888a93e6f037fd8552893e2f67383/bin
- name: cni-cfg
hostPath:
- path: /etc/cni/net.d
+ path: /var/lib/rancher/k3s/agent/etc/cni/net.d
- name: var-run-netns
hostPath:
path: /var/run/netns
+ - name: run
+ hostPath:
+ path: /run
diff --git a/tests/3node.yml b/tests/3node.yml
index 3de93b7..e414180 100644
--- a/tests/3node.yml
+++ b/tests/3node.yml
@@ -66,7 +66,7 @@ items:
containers:
- image: alpine
name: pod
- command: ["/bin/sh", "-c", "sleep 2000000000000"]
+ command: ["/bin/sh", "-c", "sleep 20000000"]
- apiVersion: v1
kind: Pod
metadata:
@@ -77,7 +77,7 @@ items:
containers:
- image: alpine
name: pod
- command: ["/bin/sh", "-c", "sleep 2000000000000"]
+ command: ["/bin/sh", "-c", "sleep 20000000"]
- apiVersion: v1
kind: Pod
metadata:
@@ -88,4 +88,4 @@ items:
containers:
- image: alpine
name: pod
- command: ["/bin/sh", "-c", "sleep 2000000000000"]
\ No newline at end of file
+ command: ["/bin/sh", "-c", "sleep 20000000"]
k8s-topo
patches:
diff --git a/Dockerfile b/Dockerfile
index 63f5d7e..64c7582 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -19,9 +19,9 @@ COPY web/nginx.conf /etc/nginx/conf.d/default.conf
RUN mkdir -p /run/nginx
-RUN mkdir /lib64 && ln -s /lib/libc.musl-x86_64.so.1 /lib64/ld-linux-x86-64.so.2
+# RUN mkdir /lib64 && ln -s /lib/libc.musl-x86_64.so.1 /lib64/ld-linux-x86-64.so.2
-RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl
+RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/arm/kubectl
RUN chmod +x kubectl
ENV PATH="/k8s-topo:/k8s-topo/bin:${PATH}"
diff --git a/bin/k8s-topo b/bin/k8s-topo
index b1c3acd..3d60d06 100755
--- a/bin/k8s-topo
+++ b/bin/k8s-topo
@@ -122,6 +122,8 @@ def parse_endpoints(devices, endpoints, link, idx):
device = devices.get(device_name, Host(device_name))
elif "qrtr" in device_name.lower():
device = devices.get(device_name, Quagga(device_name))
+ elif "frr" in device_name.lower():
+ device = devices.get(device_name, FRR(device_name))
elif "xrv" in device_name.lower():
device = devices.get(device_name, XRV(device_name))
elif "vmx" in device_name.lower():
@@ -449,8 +451,8 @@ class Device(object):
# nsc-sidecar container
nsc_container = client.V1Container(name="nsc-sidecar")
- nsc_container.image = "networkservicemesh/topology-sidecar-nsc:master"
- nsc_container.image_pull_policy = "IfNotPresent"
+ nsc_container.image = "qlyoung/topology-sidecar-nsc:armhf"
+ nsc_container.image_pull_policy = "Always"
env_nsm_io = client.V1EnvVar(
name="NS_NETWORKSERVICEMESH_IO", value=",".join(self._build_nsurls())
@@ -464,8 +466,8 @@ class Device(object):
# nse-sidecar container
nse_container = client.V1Container(name="nse-sidecar")
- nse_container.image = "networkservicemesh/topology-sidecar-nse:master"
- nse_container.image_pull_policy = "IfNotPresent"
+ nse_container.image = "qlyoung/topology-sidecar-nse:armhf"
+ nse_container.image_pull_policy = "Always"
nse_container.resources = resource_requirements
env_nse_name = client.V1EnvVar(name="ENDPOINT_NETWORK_SERVICE", value=self.topo)
@@ -516,11 +518,11 @@ class Device(object):
container.command = self.command
container.args = self.args
container.env = self.environment
- container.image_pull_policy = "IfNotPresent"
+ container.image_pull_policy = "Always"
init_container = client.V1Container(name=f"init-{self.name}")
- init_container.image = "networkop/init-wait:latest"
- init_container.image_pull_policy = "IfNotPresent"
+ init_container.image = "qlyoung/init-wait:armhf"
+ init_container.image_pull_policy = "Always"
init_container.args = [f"{len(self.interfaces)+1}", f"{self.sleep}"]
# Setting resource requests
@@ -645,10 +647,16 @@ class Host(Device):
class Quagga(Device):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
- self.image = "networkop/qrtr"
+ self.image = "qlyoung/frr:armhf"
self.conf_path = "/etc/quagga"
self.startup_file = "Quagga.conf"
+class FRR(Device):
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+ self.image = "qlyoung/frr:armhf"
+ self.conf_path = "/etc/frr"
+ self.startup_file = "frr.conf"
class XRV(Device):
def __init__(self, *args, **kwargs):
diff --git a/manifest.yml b/manifest.yml
index f34db64..4d620c2 100644
--- a/manifest.yml
+++ b/manifest.yml
@@ -85,8 +85,8 @@ items:
serviceAccountName: k8s-topo
hostNetwork: true
containers:
- - image: networkop/k8s-topo:0.2.0
+ - image: qlyoung/k8s-topo:armhf
imagePullPolicy: Always
name: k8s-topo
ports:
- - containerPort: 80
\ No newline at end of file
+ - containerPort: 80
BTW, pinged you on CNCF slack, maybe we can chat there?
Here's a much smaller repro (turns out it does reproduce with minimal topologies). Note that I can destroy and recreate this repeatedly, and sometimes it will work, sometimes it will not. Whether or not the pods are scheduled on the same node seems to have no bearing on whether it works.
I am almost wondering if there is a race condition somewhere because these nodes are raspis and therefore quite slow?
random.yml
:
conf_dir: /k8s-topo/examples/builder/config-random
etcd_port: 32379
links:
- endpoints:
- frr-192-0-2-7:eth1:10.0.0.1/30
- frr-192-0-2-9:eth2:10.0.0.2/30
Meshnet logs when applying this with k8s-topo
:
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-9's metadata from K8s..."
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Setting frr-192-0-2-9's SrcIp=192.168.0.230 and NetNs=/var/run/netns/cni-29f8fcc8-1b01-2429-1794-e3806fee1a77"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Skipping of pod frr-192-0-2-7 by pod frr-192-0-2-9"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-qgbpl meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Setting frr-192-0-2-7's SrcIp=192.168.0.234 and NetNs=/var/run/netns/cni-22853c58-7996-8e69-3db2-2c823bafc23b"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Update pod status frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Retrieving frr-192-0-2-7's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Checking if frr-192-0-2-7 is skipped by frr-192-0-2-7"
meshnet-f65ld meshnet time="2020-06-05T20:52:07Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Retrieving frr-192-0-2-9's metadata from K8s..."
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Setting frr-192-0-2-9's SrcIp= and NetNs="
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Update pod status frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reverse-skipping of pod frr-192-0-2-7 by pod frr-192-0-2-9"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Updating peer skipped list"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Update pod status frr-192-0-2-7 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="Reading pod frr-192-0-2-9 from K8s"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="THIS SKIPPED:" thisSkipped="[frr-192-0-2-7]"
meshnet-f65ld meshnet time="2020-06-05T20:52:15Z" level=info msg="NEW THIS SKIPPED:" newThisSkipped="[]"
Pod status:
NAME READY STATUS RESTARTS AGE
pod/k8s-topo-86cbbdbddb-5ks5m 1/1 Running 1 42h
pod/frr-192-0-2-9 0/1 Init:0/1 0 77s
pod/frr-192-0-2-7 0/1 Init:0/1 0 77s
The two are on different nodes. Here's a look at pod/frr-192-0-2-9
, running on clusterpi-42
:
kubectl exec -it frr-192-0-2-9 -c init-frr-192-0-2-9 ip link
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
link/ether 3a:8a:3a:34:34:5c brd ff:ff:ff:ff:ff:ff
root@clusterpi-42:/home/pi# ip netns exec cni-29f8fcc8-1b01-2429-1794-e3806fee1a77 ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
link/ether 3a:8a:3a:34:34:5c brd ff:ff:ff:ff:ff:ff link-netnsid 0
Hopefully this is a bit more digestible than the large topo :)
thanks. can you share how you're building your cluster?
It is 4 raspberry pis each running Raspbian 10 (kernel 4.19). k3s was installed with the following playbook:
- hosts:
- all
gather_facts: False
vars:
random_number: "{{ 100 | random }}"
tasks:
- name: set hostname
hostname:
name: "clusterpi-{{ random_number }}"
become: true
- name: reboot pi
become: true
reboot:
- name: update system
apt:
name: "*"
state: latest
update_cache: yes
become: true
- name: download k3s
get_url:
url: https://get.k3s.io
dest: /tmp/getk3s.sh
- hosts:
- master
tasks:
- name: install k3s for master
command: sh /tmp/getk3s.sh
become: true
- name: retrieve k8s join key
slurp:
src: "/var/lib/rancher/k3s/server/node-token"
register: slurped_user_data
become: true
- name: Decode data and store as fact
set_fact:
master_key: "{{ slurped_user_data.content | b64decode }}"
- hosts:
- slaves
tasks:
- name: install k3s for slave
command: sh /tmp/getk3s.sh
environment:
K3S_URL: "https://{{ k3s_master }}:6443"
K3S_TOKEN: "{{ hostvars['%s' | format(k3s_master) ]['master_key'] }}"
become: true
I.e. a totally vanilla k3s cluster. I then disabled k3s's built-in load balancer, traefik
, because it binds all the http ports.
meshnet-cni was installed by applying the patches above for ARM support and to change the CNI config paths for k3s, then rebuilding all the images directly on one of the pis and pushing them to my dockerhub registry. I updated all the image references to point at this new registry, deployed meshnet with make install
. Applied the above patches for k8s-topo and did the same.
Tried switching the container runtime to Docker, though I don't see how that could influence it. No change in behavior. Also should note the default backend for flannel on k3s is vxlan, which I've kept.
cool, i'll try to reproduce this no a k3d cluster at home. btw, i've recovered my password to cncf slack, so you can ping me there as well.
I've tested with k3d and couldn't reproduce the issue. I've pushed a k3d-test
branch to meshnet-cni repo with some extra make targets to help build the environment. I've done a few dozens tests with k8s-topo using different topologies but got 100% success rate.
Also, I've had another look at the logs and I think my initial analysis of the problem was wrong. When the code gets to the Link <nil> we've found isn't a vxlan or doesn't exist
, it means that the vxlan interface doesn't exist and will be created. it's a totally normal situation when a pod doesn't create its side of the vxlan link because it doesn't yet know the destination IP of its peer.
There's another place where logs can be collected - host OS itself. Since it's the job of a kubelet to invoke a CNI plugin, this will be done by the process running in the host OS (e.g. kubelet or k3s agent), so the logs from the meshnet CNI plugin will be written to standard logging destination (e.g. /var/log/messages or journald). Would you be able to collect those logs for the failed scenario?
Also, I've had another look at the logs and I think my initial analysis of the problem was wrong. When the code gets to the Link
we've found isn't a vxlan or doesn't exist, it means that the vxlan interface doesn't exist and will be created. it's a totally normal situation when a pod doesn't create its side of the vxlan link because it doesn't yet know the destination IP of its peer.
Yeah, that was my assessment as well; when I see those messages that typically indicates things are going well since it's creating the vxlan netdevices.
I've been trying to debug by having a look at the namespaces being created to see what is missing but the correlations between meshnetd
logs, the netns's that should be created and what devices I should be seeing in them aren't clear in my head yet. It's confusing because most of the time the pods that get stuck on init-wait
get at least 1 or 2 of the interfaces they're waiting on created but are missing the rest of them, so the namespace is getting created and the code to create interfaces is functioning but for some reason isn't making all of them. It definitely seems that there are some missing logs because I'm not seeing anything really out of the ordinary in the logs I've given you; if you have some ideas for where we could add some extra logs to get insight I can give that a shot.
Question though - are you testing on a single physical host, and if so, have you tested on a hardware cluster with multiple physical hosts before?
I'm a bit tempted to just blow this whole cluster away and set it up from scratch to see if the problem magically disappears :joy: I think I'll spend some more time debugging before going nuclear though.
I'll get those host logs for you for a node that has a stuck pod on it.
Anyone following the saga - fix is here https://github.com/networkop/k8s-topo/commit/0e258b8574571732ed2aa5c154230de35b247078
networkop/init-wait
is only available for amd64 so I built it myself: Dockerfile:entrypoint.sh (copied from your amd64 image):
INTFS=${1:-1} SLEEP=${2:-0}
int_calc () { index=0 for i in $(ls -1v /sys/class/net/ | grep 'eth|ens|eno'); do let index=index+1 done MYINT=$index }
int_calc
echo "Waiting for all $INTFS interfaces to be connected" while [ "$MYINT" -lt "$INTFS" ]; do echo "Connected $MYINT interfaces out of $INTFS" sleep 1 int_calc done
echo "Sleeping $SLEEP seconds before boot" sleep $SLEEP
root@clusterpi-69 ~# kubectl get pod NAME READY STATUS RESTARTS AGE k8s-topo-86cbbdbddb-5ks5m 1/1 Running 0 164m frr-192-0-2-1 0/1 Init:0/1 0 2m26s frr-192-0-2-7 0/1 Init:0/1 0 2m27s frr-192-0-2-3 0/1 Init:0/1 0 2m27s frr-192-0-2-9 0/1 Init:0/1 0 2m27s frr-192-0-2-4 0/1 Init:0/1 0 2m27s frr-192-0-2-5 0/1 Init:0/1 0 2m26s frr-192-0-2-8 1/1 Running 0 2m26s frr-192-0-2-0 1/1 Running 0 2m27s frr-192-0-2-6 1/1 Running 0 2m28s frr-192-0-2-2 1/1 Running 0 2m26s
root@clusterpi-69 /v/l/r/k/a/e/c/net.d# kubectl exec -it frr-192-0-2-1 --container init-frr-192-0-2-1 -- ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth0@if65: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP link/ether ea:5b:bf:92:f6:9c brd ff:ff:ff:ff:ff:ff 68: eth1@if67: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP link/ether c6:0e:85:41:be:4d brd ff:ff:ff:ff:ff:ff