networkop / meshnet-cni

a (K8s) CNI plugin to create arbitrary virtual network topologies
BSD 3-Clause "New" or "Revised" License
116 stars 27 forks source link

core DNS goes to Error state after master node restart #4

Closed vparames86 closed 5 years ago

vparames86 commented 5 years ago

I am facing an issue while using meshnet cni. I am running my cluster in a single linux VM. After restarting the node the core DNS goes to error state. I have to remove the config under /etc/cni/net.d/00-meshnet and delete the meshnet cni yml to make it come to running state again. I don't see any specific errors in pod describe or logs. I guess the core DNS is not able to find a CNI and is stuck. We have the delegate statement in the conf to use flannel but for some reason core DNS is ignoring that.

Attached the logs.

vparames86@UbuntuBionic:~$ journalctl -l -u kubelet -n 100 -- Logs begin at Sat 2019-07-06 04:59:33 UTC, end at Sat 2019-07-06 21:05:49 UTC. -- Jul 06 21:05:33 UbuntuBionic kubelet[1269]: E0706 21:05:33.542137 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "0445295849a36ccdf41f2b0fa507858c9cf1898e7fbd29159e39c4bc2ba89bd8"} Jul 06 21:05:33 UbuntuBionic kubelet[1269]: W0706 21:05:33.544430 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "2850d872054 Jul 06 21:05:33 UbuntuBionic kubelet[1269]: xxx| etcd1 |===> 2019/07/06 21:05:33 Processing Del for POD etcd1 Jul 06 21:05:34 UbuntuBionic kubelet[1269]: W0706 21:05:34.533182 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "40cac378fff Jul 06 21:05:34 UbuntuBionic kubelet[1269]: W0706 21:05:34.533182 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0b3601b16bb Jul 06 21:05:34 UbuntuBionic kubelet[1269]: xxx| coredns-5c98db65d4-5mtqr |===> 2019/07/06 21:05:34 Processing Del for POD coredns-5c98db65d4-5mtqr Jul 06 21:05:34 UbuntuBionic kubelet[1269]: xxx| qrtr-3 |===> 2019/07/06 21:05:34 Processing Del for POD qrtr-3 Jul 06 21:05:34 UbuntuBionic kubelet[1269]: E0706 21:05:34.549119 1269 cni.go:352] Error deleting default_qrtr-4/1837e6fadeca3cdbcb8fcd4f47e91d3caa084a87b0dcb80fb5f8d0d6462daf56 from network meshnet/me Jul 06 21:05:34 UbuntuBionic kubelet[1269]: E0706 21:05:34.549787 1269 remote_runtime.go:128] StopPodSandbox "1837e6fadeca3cdbcb8fcd4f47e91d3caa084a87b0dcb80fb5f8d0d6462daf56" from runtime service fail Jul 06 21:05:34 UbuntuBionic kubelet[1269]: E0706 21:05:34.549832 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "1837e6fadeca3cdbcb8fcd4f47e91d3caa084a87b0dcb80fb5f8d0d6462daf56"} Jul 06 21:05:34 UbuntuBionic kubelet[1269]: E0706 21:05:34.549886 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "efc9e914-e6de-40b9-bbdb-1568dae34283" wi Jul 06 21:05:34 UbuntuBionic kubelet[1269]: E0706 21:05:34.549911 1269 pod_workers.go:190] Error syncing pod efc9e914-e6de-40b9-bbdb-1568dae34283 ("qrtr-4_default(efc9e914-e6de-40b9-bbdb-1568dae34283)" Jul 06 21:05:36 UbuntuBionic kubelet[1269]: W0706 21:05:36.533649 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0b6350f0c3a Jul 06 21:05:36 UbuntuBionic kubelet[1269]: W0706 21:05:36.533647 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "b6b515b648d Jul 06 21:05:36 UbuntuBionic kubelet[1269]: xxx| etcd2 |===> 2019/07/06 21:05:36 Processing Del for POD etcd2 Jul 06 21:05:36 UbuntuBionic kubelet[1269]: xxx| internal-docker-registry-8df88bdd9-rwsxn |===> 2019/07/06 21:05:36 Processing Del for POD internal-docker-registry-8df88bdd9-rwsxn Jul 06 21:05:36 UbuntuBionic kubelet[1269]: E0706 21:05:36.549682 1269 cni.go:352] Error deleting default_qrtr-1/b206128cf2355e3833b80141fdcf1d87aaccf3b574786f1883fe7f5dedc8364d from network meshnet/me Jul 06 21:05:36 UbuntuBionic kubelet[1269]: E0706 21:05:36.550566 1269 remote_runtime.go:128] StopPodSandbox "b206128cf2355e3833b80141fdcf1d87aaccf3b574786f1883fe7f5dedc8364d" from runtime service fail Jul 06 21:05:36 UbuntuBionic kubelet[1269]: E0706 21:05:36.550623 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "b206128cf2355e3833b80141fdcf1d87aaccf3b574786f1883fe7f5dedc8364d"} Jul 06 21:05:36 UbuntuBionic kubelet[1269]: E0706 21:05:36.550692 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "6817cdac-dfb1-43af-bb61-4b793e2ea0e1" wi Jul 06 21:05:36 UbuntuBionic kubelet[1269]: E0706 21:05:36.550714 1269 pod_workers.go:190] Error syncing pod 6817cdac-dfb1-43af-bb61-4b793e2ea0e1 ("qrtr-1_default(6817cdac-dfb1-43af-bb61-4b793e2ea0e1)" Jul 06 21:05:37 UbuntuBionic kubelet[1269]: W0706 21:05:37.533058 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "f5469e027c3 Jul 06 21:05:37 UbuntuBionic kubelet[1269]: xxx| qrtr-5 |===> 2019/07/06 21:05:37 Processing Del for POD qrtr-5 Jul 06 21:05:37 UbuntuBionic kubelet[1269]: E0706 21:05:37.552226 1269 cni.go:352] Error deleting default_etcd1/2850d872054bc1c1734c3b3c44cef0ec23b43013bd98d0451e96829e442c7521 from network meshnet/mes Jul 06 21:05:37 UbuntuBionic kubelet[1269]: E0706 21:05:37.552902 1269 remote_runtime.go:128] StopPodSandbox "2850d872054bc1c1734c3b3c44cef0ec23b43013bd98d0451e96829e442c7521" from runtime service fail Jul 06 21:05:37 UbuntuBionic kubelet[1269]: E0706 21:05:37.552954 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "2850d872054bc1c1734c3b3c44cef0ec23b43013bd98d0451e96829e442c7521"} Jul 06 21:05:37 UbuntuBionic kubelet[1269]: E0706 21:05:37.553028 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "159436c1-25af-4bef-9509-70e138ebe018" wi Jul 06 21:05:37 UbuntuBionic kubelet[1269]: E0706 21:05:37.553060 1269 pod_workers.go:190] Error syncing pod 159436c1-25af-4bef-9509-70e138ebe018 ("etcd1_default(159436c1-25af-4bef-9509-70e138ebe018)") Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.543511 1269 cni.go:352] Error deleting kube-system_coredns-5c98db65d4-5mtqr/40cac378fff9b9b2baa93ebf8c036a1d0955932df180588ba57601739d0fd4c9 f Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.545913 1269 remote_runtime.go:128] StopPodSandbox "40cac378fff9b9b2baa93ebf8c036a1d0955932df180588ba57601739d0fd4c9" from runtime service fail Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.545963 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "40cac378fff9b9b2baa93ebf8c036a1d0955932df180588ba57601739d0fd4c9"} Jul 06 21:05:38 UbuntuBionic kubelet[1269]: W0706 21:05:38.548362 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "82b65dc0b65 Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.549936 1269 cni.go:352] Error deleting default_qrtr-3/0b3601b16bb1da68ee7dc8b3f98c9ea26b7b264b2903321fd5ead6a17bea3b96 from network meshnet/me Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.550530 1269 remote_runtime.go:128] StopPodSandbox "0b3601b16bb1da68ee7dc8b3f98c9ea26b7b264b2903321fd5ead6a17bea3b96" from runtime service fail Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.550572 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "0b3601b16bb1da68ee7dc8b3f98c9ea26b7b264b2903321fd5ead6a17bea3b96"} Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.550701 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "e3262060-b8dc-496c-8435-8da15b677684" wi Jul 06 21:05:38 UbuntuBionic kubelet[1269]: E0706 21:05:38.550733 1269 pod_workers.go:190] Error syncing pod e3262060-b8dc-496c-8435-8da15b677684 ("qrtr-3_default(e3262060-b8dc-496c-8435-8da15b677684)" Jul 06 21:05:38 UbuntuBionic kubelet[1269]: xxx| coredns-5c98db65d4-5mtqr |===> 2019/07/06 21:05:38 Processing Del for POD coredns-5c98db65d4-5mtqr Jul 06 21:05:40 UbuntuBionic kubelet[1269]: W0706 21:05:40.533114 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "84a1a67acdb Jul 06 21:05:40 UbuntuBionic kubelet[1269]: xxx| etcd0 |===> 2019/07/06 21:05:40 Processing Del for POD etcd0 Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.543101 1269 cni.go:352] Error deleting default_etcd2/0b6350f0c3acd929ea1b6a95eddfb24358b25d33a200db1b742e8a8d82be81a1 from network meshnet/mes Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.543700 1269 remote_runtime.go:128] StopPodSandbox "0b6350f0c3acd929ea1b6a95eddfb24358b25d33a200db1b742e8a8d82be81a1" from runtime service fail Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.543741 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "0b6350f0c3acd929ea1b6a95eddfb24358b25d33a200db1b742e8a8d82be81a1"} Jul 06 21:05:40 UbuntuBionic kubelet[1269]: W0706 21:05:40.545533 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6a19063eeec Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.547020 1269 cni.go:352] Error deleting default_internal-docker-registry-8df88bdd9-rwsxn/b6b515b648db344e493db673da53da7619b99b42b55d84662b505c Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.547768 1269 remote_runtime.go:128] StopPodSandbox "b6b515b648db344e493db673da53da7619b99b42b55d84662b505c728ff42ff2" from runtime service fail Jul 06 21:05:40 UbuntuBionic kubelet[1269]: E0706 21:05:40.547803 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "b6b515b648db344e493db673da53da7619b99b42b55d84662b505c728ff42ff2"} Jul 06 21:05:40 UbuntuBionic kubelet[1269]: W0706 21:05:40.550593 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "9cdf59d9555 Jul 06 21:05:40 UbuntuBionic kubelet[1269]: xxx| etcd2 |===> 2019/07/06 21:05:40 Processing Del for POD etcd2 Jul 06 21:05:40 UbuntuBionic kubelet[1269]: xxx| internal-docker-registry-8df88bdd9-rwsxn |===> 2019/07/06 21:05:40 Processing Del for POD internal-docker-registry-8df88bdd9-rwsxn Jul 06 21:05:41 UbuntuBionic kubelet[1269]: E0706 21:05:41.541071 1269 cni.go:352] Error deleting default_qrtr-5/f5469e027c389390a1aee55ef90a11be27ef12d14fbcbed9a30c2d80f78b2515 from network meshnet/me Jul 06 21:05:41 UbuntuBionic kubelet[1269]: E0706 21:05:41.542058 1269 remote_runtime.go:128] StopPodSandbox "f5469e027c389390a1aee55ef90a11be27ef12d14fbcbed9a30c2d80f78b2515" from runtime service fail Jul 06 21:05:41 UbuntuBionic kubelet[1269]: E0706 21:05:41.542113 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "f5469e027c389390a1aee55ef90a11be27ef12d14fbcbed9a30c2d80f78b2515"} Jul 06 21:05:41 UbuntuBionic kubelet[1269]: E0706 21:05:41.542197 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "14296cb2-be2e-40f9-a170-4ec2afbf02fb" wi Jul 06 21:05:41 UbuntuBionic kubelet[1269]: E0706 21:05:41.542229 1269 pod_workers.go:190] Error syncing pod 14296cb2-be2e-40f9-a170-4ec2afbf02fb ("qrtr-5_default(14296cb2-be2e-40f9-a170-4ec2afbf02fb)" Jul 06 21:05:42 UbuntuBionic kubelet[1269]: E0706 21:05:42.556580 1269 cni.go:352] Error deleting kube-system_coredns-5c98db65d4-5mtqr/82b65dc0b65ca62eccb96061eb6f6016f0a6d5dcaa251f80864b018b4c50da86 f Jul 06 21:05:42 UbuntuBionic kubelet[1269]: E0706 21:05:42.557242 1269 remote_runtime.go:128] StopPodSandbox "82b65dc0b65ca62eccb96061eb6f6016f0a6d5dcaa251f80864b018b4c50da86" from runtime service fail Jul 06 21:05:42 UbuntuBionic kubelet[1269]: E0706 21:05:42.557360 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "82b65dc0b65ca62eccb96061eb6f6016f0a6d5dcaa251f80864b018b4c50da86"} Jul 06 21:05:42 UbuntuBionic kubelet[1269]: E0706 21:05:42.557438 1269 kuberuntime_manager.go:636] killPodWithSyncResult failed: failed to "KillPodSandbox" for "a1af1182-4082-43ac-8c50-47e4008b80c2" wi Jul 06 21:05:42 UbuntuBionic kubelet[1269]: E0706 21:05:42.557478 1269 pod_workers.go:190] Error syncing pod a1af1182-4082-43ac-8c50-47e4008b80c2 ("coredns-5c98db65d4-5mtqr_kube-system(a1af1182-4082-43 Jul 06 21:05:44 UbuntuBionic kubelet[1269]: W0706 21:05:44.533229 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "1f2a871eb93 Jul 06 21:05:44 UbuntuBionic kubelet[1269]: E0706 21:05:44.541883 1269 cni.go:352] Error deleting default_etcd0/84a1a67acdb87980d16b05a9daa4c22fbf8dafea5b4db85a57487ae224843824 from network meshnet/mes Jul 06 21:05:44 UbuntuBionic kubelet[1269]: xxx| qrtr-2 |===> 2019/07/06 21:05:44 Processing Del for POD qrtr-2 Jul 06 21:05:44 UbuntuBionic kubelet[1269]: E0706 21:05:44.543065 1269 remote_runtime.go:128] StopPodSandbox "84a1a67acdb87980d16b05a9daa4c22fbf8dafea5b4db85a57487ae224843824" from runtime service fail Jul 06 21:05:44 UbuntuBionic kubelet[1269]: E0706 21:05:44.543108 1269 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "84a1a67acdb87980d16b05a9daa4c22fbf8dafea5b4db85a57487ae224843824"} Jul 06 21:05:44 UbuntuBionic kubelet[1269]: W0706 21:05:44.545217 1269 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "8e5885ec7cf Jul 06 21:05:44 UbuntuBionic kubelet[1269]: xxx| etcd0 |===> 2019/07/06 21:05:44 Processing Del for POD etcd0

vparames86@UbuntuBionic:~$ kubectl logs coredns-5c98db65d4-79b6h --namespace kube-system .:53 2019-07-05T23:15:33.957Z [INFO] CoreDNS-1.3.1 2019-07-05T23:15:33.957Z [INFO] linux/amd64, go1.11.4, 6b56a9c CoreDNS-1.3.1 linux/amd64, go1.11.4, 6b56a9c 2019-07-05T23:15:33.957Z [INFO] plugin/reload: Running configuration MD5 = 5d5369fbc12f985709b924e721217843 [INFO] SIGTERM: Shutting down servers then terminating

vparames86@UbuntuBionic:~$ kubectl describe pod coredns-5c98db65d4-79b6h --namespace kube-system Name: coredns-5c98db65d4-79b6h Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: ubuntubionic/10.0.0.4 Start Time: Wed, 03 Jul 2019 23:19:08 +0000 Labels: k8s-app=kube-dns pod-template-hash=5c98db65d4 Annotations: Status: Running IP:
Controlled By: ReplicaSet/coredns-5c98db65d4 Containers: coredns: Container ID: docker://b9847e634ecb4bfb353d89ee40ed8572b702acd94f2110b658e4b2ccfe2f8d25 Image: k8s.gcr.io/coredns:1.3.1 Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:02382353821b12c21b062c59184e227e001079bb13ebd01f9d3270ba0fcbf1e4 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Terminated Reason: Error Exit Code: 255 Started: Fri, 05 Jul 2019 23:15:33 +0000 Finished: Sat, 06 Jul 2019 03:08:21 +0000 Ready: False Restart Count: 3 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-8d5cc (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-8d5cc: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-8d5cc Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal SandboxChanged 98m (x3369 over 17h) kubelet, ubuntubionic Pod sandbox changed, it will be killed and re-created. Normal SandboxChanged 18s (x20 over 6m56s) kubelet, ubuntubionic Pod sandbox changed, it will be killed and re-created.

vparames86@UbuntuBionic:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default etcd0 0/1 Error 3 2d22h default etcd1 0/1 Error 3 2d22h default etcd2 0/1 Error 3 2d22h default internal-docker-registry-8df88bdd9-rwsxn 0/1 Error 0 22h default k8s-topo 1/1 Running 2 21h default qrtr-1 0/1 Init:Error 0 20h default qrtr-2 0/1 Init:Error 0 20h default qrtr-3 0/1 Init:Error 0 20h default qrtr-4 0/1 Init:Error 0 20h default qrtr-5 0/1 Init:Error 0 20h kube-system coredns-5c98db65d4-5mtqr 0/1 Error 3 2d21h kube-system coredns-5c98db65d4-79b6h 0/1 Error 3 2d21h kube-system etcd-ubuntubionic 1/1 Running 7 3d2h kube-system kube-apiserver-ubuntubionic 1/1 Running 7 3d2h kube-system kube-controller-manager-ubuntubionic 1/1 Running 10 3d2h kube-system kube-flannel-ds-amd64-z2vvm 1/1 Running 5 2d21h kube-system kube-meshnet-wrz8n 1/1 Running 3 25h kube-system kube-proxy-vw8v9 1/1 Running 7 3d2h kube-system kube-scheduler-ubuntubionic 1/1 Running 10 3d2h

vparames86 commented 5 years ago

I figured out the issue. In your entrypoint docker script you are replacing the 00-meshnet cni config everytime.

echo "Mergin existing CNI configuration with meshnet" existing=$(ls -1 /etc/cni/net.d/ | egrep "flannel|weave|bridge|calico|contiv|cilium|cni" | head -n1) jq -s '.[1].delegate = (.[0].plugins[0])' /etc/cni/net.d/$existing /etc/cni/net.d/meshnet.conf | jq .[1] > /etc/cni/net.d/00-meshnet.conf

But this replaces the ETCD host setting with the dummy value again.

networkop commented 5 years ago

oh wow. i wonder how this ever worked then. However I think the problem is much bigger than this. The way I deploy an external etcd cluster is using a simple deployment, not operator, so you run into issues when etcd pods cannot join the cluster on restart. as you can see in your output all 3 etcd nodes are in error state. normally, in these situations I simply re-deployed meshnet-cni from scratch and I think this is still the easiest solution so far. I've got a plan for meshnet-cni V2 to get rid of external etcd and re-use k8s etcd via CRDs instead. This should resolve a lot of etcd-related issues people have been seeing so far. But that will take a few weeks, so ETA is roughly July-Aug 2019.

vparames86 commented 5 years ago

I found the real issue. If you see the journalctl logs, The delegate Add failed due to missing network name.

Jul 07 05:50:59 UbuntuBionic kubelet[1269]: ---| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Processing ADD for POD qrtr-192-0-2-2 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: ---| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 About to delegate Add to flannel Jul 07 05:50:59 UbuntuBionic kubelet[1269]: E0707 05:50:59.587380 1269 cni.go:331] Error adding default_qrtr-192-0-2-2/38c11d6eb3762a420d9172b62f8a1df7f969000145785a523281773b34ae3c99 to network meshnet/meshnet_network: Error invoking Delegate Add missing network name Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Processing Del for POD qrtr-192-0-2-2 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Setting srcIP: and NetNS: for POD:qrtr-192-0-2-2 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Creating Veth struct with NetNS:/proc/106180/ns/net and intfName: eth1, IP:10.0.0.2/30 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Error removing Veth link: failed to lookup "eth1" in "/proc/106180/ns/net": Link not found Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Setting skipped reversed veth flag for qrtr-192-0-2-2 and peer qrtr-192-0-2-3 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Creating Veth struct with NetNS:/proc/106180/ns/net and intfName: eth2, IP:10.0.0.5/30 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Error removing Veth link: failed to lookup "eth2" in "/proc/106180/ns/net": Link not found Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Setting skipped reversed veth flag for qrtr-192-0-2-2 and peer qrtr-192-0-2-0 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Creating Veth struct with NetNS:/proc/106180/ns/net and intfName: eth3, IP:10.0.0.9/30 Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Error removing Veth link: failed to lookup "eth3" in "/proc/106180/ns/net": Link not found Jul 07 05:50:59 UbuntuBionic kubelet[1269]: xxx| qrtr-192-0-2-2 |===> 2019/07/07 05:50:59 Setting skipped reversed veth flag for qrtr-192-0-2-2 and peer qrtr-192-0-2-1 Jul 07 05:51:00 UbuntuBionic kubelet[1269]: E0707 05:51:00.030603 1269 cni.go:352] Error deleting default_qrtr-192-0-2-2/38c11d6eb3762a420d9172b62f8a1df7f969000145785a523281773b34ae3c99 from network meshnet/meshnet_network: Error invoking Delegate Del missing network name

I had to add the network name to the config as follows. This was missing in the config mentioned in your blog. This seems to be a mandatory field for the delegate ADD to work now. { "cniVersion": "0.1.0", "name": "meshnet_network", "type": "meshnet", "etcd_host": "10.102.160.152", "etcd_port": "2379", "delegate": { "type": "flannel", "name": "flannel_network", "delegate": { "forceAddress": true, "hairpinMode": true, "isDefaultGateway": true } } }

vparames86 commented 5 years ago

Also we should copy the "00-meshnet.conf" only when it not present already in the entrypoint script. This will preserve the changes we made even after node restart.

vparames86 commented 5 years ago

It seems like a chicken and egg problem in the etcd cluster.

vparames86@UbuntuBionic:~$ journalctl -l -u kubelet -n 100 --no-pager -- Logs begin at Sun 2019-07-07 03:36:56 UTC, end at Sun 2019-07-07 21:37:38 UTC. -- Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.083403 1074 cni.go:352] Error deleting kube-system_coredns-5c98db65d4-79b6h/9427572fb87df1cac3652ee108380e7e215cd40e00596bd5cf8fe0d0d5795f46 from network meshnet/meshnet_network: dial tcp 10.102.160.152:2379: connect: connection refused Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.084030 1074 remote_runtime.go:128] StopPodSandbox "9427572fb87df1cac3652ee108380e7e215cd40e00596bd5cf8fe0d0d5795f46" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-5c98db65d4-79b6h_kube-system" network: dial tcp 10.102.160.152:2379: connect: connection refused Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.084067 1074 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "9427572fb87df1cac3652ee108380e7e215cd40e00596bd5cf8fe0d0d5795f46"} Jul 07 21:37:21 UbuntuBionic kubelet[1074]: W0707 21:37:21.087617 1074 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "2ee3ba9b8c7b7333a744c0ac2eb88babd40902dd6222ff22cb5c9a23bd161d8f" Jul 07 21:37:21 UbuntuBionic kubelet[1074]: xxx| coredns-5c98db65d4-79b6h |===> 2019/07/07 21:37:21 Processing Del for POD coredns-5c98db65d4-79b6h Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.124628 1074 cni.go:352] Error deleting default_etcd0/ffd55da716aaa7fefd8426da423bf8275886e30c73fd957f67e684d7bfe24c30 from network meshnet/meshnet_network: dial tcp 10.102.160.152:2379: connect: connection refused Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.125086 1074 remote_runtime.go:128] StopPodSandbox "ffd55da716aaa7fefd8426da423bf8275886e30c73fd957f67e684d7bfe24c30" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "etcd0_default" network: dial tcp 10.102.160.152:2379: connect: connection refused Jul 07 21:37:21 UbuntuBionic kubelet[1074]: E0707 21:37:21.125117 1074 kuberuntime_manager.go:841] Failed to stop sandbox {"docker" "ffd55da716aaa7fefd8426da423bf8275886e30c73fd957f67e684d7bfe24c30"} Jul 07 21:37:21 UbuntuBionic kubelet[1074]: W0707 21:37:21.126816 1074 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "e26e8669384f5f8c78e0a465bedc04ffb46251499ef81acd4c0df83adea68845" Jul 07 21:37:21 UbuntuBionic kubelet[1074]: xxx| etcd0 |===> 2019/07/07 21:37:21 Processing Del for POD etcd0 Jul 07 21:37:23 UbuntuBionic kubelet[1074]: E0707 21:37:23.086303 1074 cni.go:352] Error deleting default_etcd2/45307bc8a291aee15489dac859ea32d1c8f7ac6d061d0d02d912ee1d1ee4d64c from network meshnet/meshnet_network: dial tcp 10.102.160.152:2379: connect: connection refused Jul 07 21:37:23 UbuntuBionic kubelet[1074]: E0707 21:37:23.089039 1074 remote_runtime.go:128] StopPodSandbox "45307bc8a291aee15489dac859ea32d1c8f7ac6d061d0d02d912ee1d1ee4d64c" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "etcd2_default" network: dial tcp 10.102.160.152:2379: connect: connection refused

I am seeing a connection refused error as well. I guess using k8s etcd will resolve lot of these problems.

networkop commented 5 years ago

ok, cool. so let's address these issues one by one.

networkop commented 5 years ago

this should fix the incorrect flannel config

networkop commented 5 years ago

and this should not re-build CNI config on restart

networkop commented 5 years ago

and etcd issues will be fixed in the next V2 version in a few weeks.

networkop commented 5 years ago

etcd issue solved in #6