nsmgr crashing when deploying the `floating_vl3-basic` example on 3 kind clusters

fr-Pursuit commented 1 year ago

Hello!

I've tried to deploy the NSM over interdomain vL3 network example on 3 kind clusters, running on 3 separate VMs.

Unfortunately, on the first two clusters (ie: the ones containing pods that should connect to each other), one of the two replicates of nsmgr constantly crashes and is then stuck in the CrashLoopBackOff state. The logs indicate an error about too many files being opened (can not create node poller: too many open files) followed by a segfault. You can view the full logs here.

I've tried to change the system's soft nofile limit, but it didn't fix the problem. By inspecting the other instance of nsmgr, I then found out that the soft limit was already changed inside the pod to match the system's hard limit (which is 1048576... I doubt this limit of file descriptors should be exceeded under normal working conditions).

Do you have any ideas about what may cause this issue? Thanks!

PS: I'm using MetalLB as a LoadBalancer provider. I gave each instance of MetalLB the 192.168.x.0/24 prefix, where x is the VM number (0, 1 or 2). I then manually updated the VM's routing tables to ensure the packets were correctly routed (I confirmed this setup is working by verifying the Spiffe Federation was successful, and by ensuring the NSE (un)registrations appeared in the registry's logs, on the third cluster)

fr-Pursuit commented 1 year ago

@edwarnicke

edwarnicke commented 1 year ago

@denis-tingaikin Could we have someone look at this? What information might they collect to help us get to the bottom of it.

denis-tingaikin commented 1 year ago

@fr-Pursuit Hello!

Seems like you need to change the limits for your docker ;)

This might be useful for you https://github.com/kubernetes-sigs/kind/issues/2586

fr-Pursuit commented 1 year ago

@denis-tingaikin Thanks! I changed the following sysctl variables on the VMs, and nsmgr is not crashing anymore:

fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

My two NSCs still can't ping each other, but that's probably a different issue... I'm still looking into it.

denis-tingaikin commented 1 year ago

Could you also attach this info?

docker inspect network kind
kubectl describe nodes for cluster1
kubectl describe nodes for cluster2

fr-Pursuit commented 1 year ago

Here are:

The result of `docker inspect network kind` on my first VM

```json [ { "Name": "kind", "Id": "d50bd62707b7e3481dc8cce1ddbd3f37211e490f08b9ffcc3400077d5b366282", "Created": "2022-11-30T14:54:16.281536246Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": true, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" }, { "Subnet": "fc00:f853:ccd:e793::/64", "Gateway": "fc00:f853:ccd:e793::1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "4807b5e438bd0e99a200c1853a0474f57606c05d704ebf97ba76b4e06be70ed6": { "Name": "a-control-plane", "EndpointID": "d9062d3857fc6a708903dca3279e648796e29f883ee3bc8c5ce1141589aeb196", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "fc00:f853:ccd:e793::2/64" }, "6925e22d8a34fe1e23ef08af1a4e83cead5b3e769d3c921231593c153a449f07": { "Name": "a-worker2", "EndpointID": "4e9ace569866378d04cabedc2af84a121ca9cf303aa424105327bc76bca7bdac", "MacAddress": "02:42:ac:12:00:04", "IPv4Address": "172.18.0.4/16", "IPv6Address": "fc00:f853:ccd:e793::4/64" }, "e0bfaf9fafb1d62d932bca907095fd90d97e25b220b9ade9b9094c1ae5bf054a": { "Name": "a-worker", "EndpointID": "57d711fb662dc2ee0fc63982a9b48c80266f98f422d073440732e179c1097112", "MacAddress": "02:42:ac:12:00:03", "IPv4Address": "172.18.0.3/16", "IPv6Address": "fc00:f853:ccd:e793::3/64" } }, "Options": { "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ] ```

The result of `docker inspect network kind` on my second VM

```json [ { "Name": "kind", "Id": "0ee0f6ad2009a1523b8932f1a897c0f2840091394b5cea9da9c6e868d8f89f16", "Created": "2022-11-30T16:53:32.611641922Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": true, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.17.1.0/24", "Gateway": "172.17.1.1" }, { "Subnet": "fc00:f853:ccd:e793::/64", "Gateway": "fc00:f853:ccd:e793::1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "089707a1268aa4dddf973e4b9c396594648b9c961cc08912ff3b88f56e736a90": { "Name": "b-control-plane", "EndpointID": "8916b1ebc9ed7cd64bbf02c1fd300a7688f05e3e64af9ba7945e2285356b3e29", "MacAddress": "02:42:ac:11:01:03", "IPv4Address": "172.17.1.3/24", "IPv6Address": "fc00:f853:ccd:e793::3/64" }, "1e70ea1aa2cbcbb413478eaf25b657c98d59211a7c79b82040e59734b7a2f9ac": { "Name": "b-worker2", "EndpointID": "46c73f71ab2b364e07a46f3db607a0faabc334ee5626b8149f3c4a818f20b684", "MacAddress": "02:42:ac:11:01:04", "IPv4Address": "172.17.1.4/24", "IPv6Address": "fc00:f853:ccd:e793::4/64" }, "a3b4d3263bda8072b6d8c18f5a454533f5648f049d837e9c7b6455255702aff9": { "Name": "b-worker", "EndpointID": "b2f83d5522cf48f2b331d24f9c81c88425890ab6b4d70db0f161ea23e5f074d3", "MacAddress": "02:42:ac:11:01:02", "IPv4Address": "172.17.1.2/24", "IPv6Address": "fc00:f853:ccd:e793::2/64" } }, "Options": { "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ] ```

`kubectl describe nodes` on my first cluster

``` Name: a-control-plane Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=a-control-plane kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:28:59 +0000 Taints: node-role.kubernetes.io/control-plane:NoSchedule Unschedulable: false Lease: HolderIdentity: a-control-plane AcquireTime: RenewTime: Fri, 13 Jan 2023 14:10:45 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:10:38 +0000 Thu, 12 Jan 2023 17:28:50 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:10:38 +0000 Thu, 12 Jan 2023 17:28:50 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:10:38 +0000 Thu, 12 Jan 2023 17:28:50 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:10:38 +0000 Thu, 12 Jan 2023 17:29:23 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.18.0.2 Hostname: a-control-plane Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 System Info: Machine ID: 09c1d50c34a447a187205851de0913a5 System UUID: 84276729-0208-4964-a7dd-1ccc17cea650 Boot ID: 4a66d760-2933-4cb6-9169-792acd58928f Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.0.0/24 PodCIDRs: 10.244.0.0/24 ProviderID: kind://docker/a/a-control-plane Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system coredns-565d847f94-gfndm 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 20h kube-system coredns-565d847f94-ph5zz 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 20h kube-system etcd-a-control-plane 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 20h kube-system kindnet-tr84x 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-apiserver-a-control-plane 250m (6%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-controller-manager-a-control-plane 200m (5%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-proxy-hp4m8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-scheduler-a-control-plane 100m (2%) 0 (0%) 0 (0%) 0 (0%) 20h local-path-storage local-path-provisioner-684f458cdd-4vwwp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-mq9b7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 950m (23%) 100m (2%) memory 290Mi (3%) 390Mi (4%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Name: a-worker Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=a-worker kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:29:21 +0000 Taints: Unschedulable: false Lease: HolderIdentity: a-worker AcquireTime: RenewTime: Fri, 13 Jan 2023 14:10:49 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:09:02 +0000 Thu, 12 Jan 2023 17:29:21 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:09:02 +0000 Thu, 12 Jan 2023 17:29:21 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:09:02 +0000 Thu, 12 Jan 2023 17:29:21 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:09:02 +0000 Thu, 12 Jan 2023 17:29:31 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.18.0.3 Hostname: a-worker Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 System Info: Machine ID: 3e623b9b3d12444dadf6d6e9b2387ff3 System UUID: a2f2af63-404a-433c-853d-0c9c69a64859 Boot ID: 4a66d760-2933-4cb6-9169-792acd58928f Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.2.0/24 PodCIDRs: 10.244.2.0/24 ProviderID: kind://docker/a/a-worker Non-terminated Pods: (11 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system kindnet-v8hgh 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-proxy-h5nk8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system controller-84d6d4db45-f95c8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-m4bdg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h ns-floating-vl3-basic nse-vl3-vpp-1 150m (3%) 500m (12%) 400Mi (5%) 400Mi (5%) 20h nsm-system forwarder-vpp-ndfkb 150m (3%) 525m (13%) 500Mi (6%) 500Mi (6%) 20h nsm-system nsmgr-ntm6l 275m (6%) 475m (11%) 140Mi (1%) 240Mi (3%) 20h nsm-system registry-fcfc6c94-fkds6 100m (2%) 200m (5%) 40Mi (0%) 40Mi (0%) 20h nsm-system registry-proxy-7b647b8fbc-chsww 100m (2%) 200m (5%) 40Mi (0%) 40Mi (0%) 20h spire spire-agent-hrspx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h spire spire-server-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 875m (21%) 2 (50%) memory 1170Mi (14%) 1270Mi (15%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Name: a-worker2 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=a-worker2 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:29:20 +0000 Taints: Unschedulable: false Lease: HolderIdentity: a-worker2 AcquireTime: RenewTime: Fri, 13 Jan 2023 14:10:49 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:09:04 +0000 Thu, 12 Jan 2023 17:29:20 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:09:04 +0000 Thu, 12 Jan 2023 17:29:20 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:09:04 +0000 Thu, 12 Jan 2023 17:29:20 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:09:04 +0000 Thu, 12 Jan 2023 17:29:31 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.18.0.4 Hostname: a-worker2 Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148692Ki pods: 110 System Info: Machine ID: ce8ae0c36d3f4dfa90c7be480ba4b762 System UUID: 1385fed4-fd8a-405e-80b2-5bbc24277d9f Boot ID: 4a66d760-2933-4cb6-9169-792acd58928f Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 ProviderID: kind://docker/a/a-worker2 Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system kindnet-mxgv6 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-proxy-j787w 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-nq57v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h ns-floating-vl3-basic alpine 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h nsm-system admission-webhook-k8s-64849fd57c-cr7nc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h nsm-system cluster-info-cd8dd5f58-8fbc7 75m (1%) 100m (2%) 40Mi (0%) 40Mi (0%) 20h nsm-system forwarder-vpp-rhn5v 150m (3%) 525m (13%) 500Mi (6%) 500Mi (6%) 20h nsm-system nsmgr-kw62t 275m (6%) 475m (11%) 140Mi (1%) 240Mi (3%) 20h nsm-system nsmgr-proxy-5d97b9d9dc-mv9sq 250m (6%) 450m (11%) 125Mi (1%) 125Mi (1%) 20h spire spire-agent-nq2dx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 850m (21%) 1650m (41%) memory 855Mi (10%) 955Mi (12%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: ```

`kubectl describe nodes` on my second cluster

``` Name: b-control-plane Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=b-control-plane kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:28:51 +0000 Taints: node-role.kubernetes.io/control-plane:NoSchedule Unschedulable: false Lease: HolderIdentity: b-control-plane AcquireTime: RenewTime: Fri, 13 Jan 2023 14:11:21 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:10:46 +0000 Thu, 12 Jan 2023 17:28:47 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:10:46 +0000 Thu, 12 Jan 2023 17:28:47 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:10:46 +0000 Thu, 12 Jan 2023 17:28:47 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:10:46 +0000 Thu, 12 Jan 2023 17:29:15 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.17.1.3 Hostname: b-control-plane Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 System Info: Machine ID: d5b07dddc5084b2898e0c94c0ab261c2 System UUID: 1a727da5-95a9-4c85-81ff-34c0c050d50f Boot ID: c8352401-6517-40dc-8591-05964a032ae5 Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.0.0/24 PodCIDRs: 10.244.0.0/24 ProviderID: kind://docker/b/b-control-plane Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system coredns-565d847f94-8tq2q 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 20h kube-system coredns-565d847f94-d5sw9 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 20h kube-system etcd-b-control-plane 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 20h kube-system kindnet-hntt5 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-apiserver-b-control-plane 250m (6%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-controller-manager-b-control-plane 200m (5%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-proxy-t4wtt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h kube-system kube-scheduler-b-control-plane 100m (2%) 0 (0%) 0 (0%) 0 (0%) 20h local-path-storage local-path-provisioner-684f458cdd-44rlf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-b5vxf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 950m (23%) 100m (2%) memory 290Mi (3%) 390Mi (4%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Name: b-worker Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=b-worker kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:29:12 +0000 Taints: Unschedulable: false Lease: HolderIdentity: b-worker AcquireTime: RenewTime: Fri, 13 Jan 2023 14:11:23 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:09:36 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:09:36 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:09:36 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:09:36 +0000 Thu, 12 Jan 2023 17:29:22 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.17.1.2 Hostname: b-worker Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 System Info: Machine ID: 39bf2801fe544e8ba461d397db575d5a System UUID: 80b00964-9a6d-4226-8169-877b0c672fe0 Boot ID: c8352401-6517-40dc-8591-05964a032ae5 Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 ProviderID: kind://docker/b/b-worker Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system kindnet-dvvxz 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-proxy-q67d4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system controller-84d6d4db45-6h7ld 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-trg8j 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h ns-floating-vl3-basic alpine 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h nsm-system admission-webhook-k8s-64849fd57c-x6dxh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h nsm-system forwarder-vpp-5pvk9 150m (3%) 525m (13%) 500Mi (6%) 500Mi (6%) 20h nsm-system nsmgr-proxy-789db7679d-6479q 250m (6%) 450m (11%) 125Mi (1%) 125Mi (1%) 20h nsm-system nsmgr-przcv 275m (6%) 475m (11%) 140Mi (1%) 240Mi (3%) 20h spire spire-agent-5tg2j 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 775m (19%) 1550m (38%) memory 815Mi (10%) 915Mi (11%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Name: b-worker2 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=b-worker2 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Jan 2023 17:29:12 +0000 Taints: Unschedulable: false Lease: HolderIdentity: b-worker2 AcquireTime: RenewTime: Fri, 13 Jan 2023 14:11:22 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 13 Jan 2023 14:09:13 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 13 Jan 2023 14:09:13 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 13 Jan 2023 14:09:13 +0000 Thu, 12 Jan 2023 17:29:12 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 13 Jan 2023 14:09:13 +0000 Thu, 12 Jan 2023 17:29:22 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.17.1.4 Hostname: b-worker2 Capacity: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 32854724Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8148700Ki pods: 110 System Info: Machine ID: 2207a47c41474eb2a7f240fbc9e2fd17 System UUID: e24ae910-cf06-4bf3-ba9c-814c021cf61a Boot ID: c8352401-6517-40dc-8591-05964a032ae5 Kernel Version: 6.0.0-0.deb11.6-cloud-amd64 OS Image: Ubuntu 22.04.1 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.9 Kubelet Version: v1.25.3 Kube-Proxy Version: v1.25.3 PodCIDR: 10.244.2.0/24 PodCIDRs: 10.244.2.0/24 ProviderID: kind://docker/b/b-worker2 Non-terminated Pods: (11 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system kindnet-4m26k 100m (2%) 100m (2%) 50Mi (0%) 50Mi (0%) 20h kube-system kube-proxy-q2jb2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h metallb-system speaker-qrnp9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h ns-floating-vl3-basic nse-vl3-vpp-2 150m (3%) 500m (12%) 400Mi (5%) 400Mi (5%) 20h nsm-system cluster-info-cd8dd5f58-lq4ss 75m (1%) 100m (2%) 40Mi (0%) 40Mi (0%) 20h nsm-system forwarder-vpp-g7649 150m (3%) 525m (13%) 500Mi (6%) 500Mi (6%) 20h nsm-system nsmgr-dgrrd 275m (6%) 475m (11%) 140Mi (1%) 240Mi (3%) 20h nsm-system registry-747d8b5c65-vtnbr 100m (2%) 200m (5%) 40Mi (0%) 40Mi (0%) 20h nsm-system registry-proxy-79f8db49d4-6cm48 100m (2%) 200m (5%) 40Mi (0%) 40Mi (0%) 20h spire spire-agent-s9hhz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h spire spire-server-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 950m (23%) 2100m (52%) memory 1210Mi (15%) 1310Mi (16%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: ```

Although I have configured the VM's routing tables to route each 192.168.x.0/24 prefix to the correct VM, The Docker subnets (172.18.0.0/16 and 172.17.1.0/24) are not accessible outside of the local VM. The fact they are different is probably due to the presence of a configuration file from a previous test I made.

edwarnicke commented 1 year ago

@denis-tingaikin Having chatted with @fr-Pursuit a bit offline, it seems that the underlying issue is that in his setup Nodes do not have an ExternalIP, and the InternalIPs are unreachable between clusters.

I think we can probably fix this by implementing https://github.com/networkservicemesh/cmd-nsmgr-proxy/issues/407 .

Could we get that going quickly?

fr-Pursuit commented 1 year ago

Having chatted with @fr-Pursuit a bit offline, it seems that the underlying issue is that in his setup Nodes do not have an ExternalIP, and the InternalIPs are unreachable between clusters.

Indeed, that was exactly the problem. I tweaked my setup to allow inter-cluster communication between nodes using their InternalIPs, and the example worked.

However I'm not sure how standard using InternalIPs for inter-cluster communication is. And since kind doesn't appear to let you assign ExternalIPs to node, it would be great if we could use a Service of type LoadBalancer to get an ExternalIP for inter-cluster communication (which is possible on kind using MetalLB) as @edwarnicke described in his issue.

denis-tingaikin commented 1 year ago

Hello @fr-Pursuit

Does the issue still actual for you?

fr-Pursuit commented 1 year ago

Hi @denis-tingaikin

Everything was fixed, thanks!

denis-tingaikin commented 1 year ago

@fr-Pursuit Perfect!

Feel free to open new issues if you see any problems :)

networkservicemesh / cmd-nsmgr

nsmgr crashing when deploying the `floating_vl3-basic` example on 3 kind clusters #576