networkop / meshnet-cni

a (K8s) CNI plugin to create arbitrary virtual network topologies
BSD 3-Clause "New" or "Revised" License
116 stars 27 forks source link

IP confilcts with k8s coredns pod #24

Closed kongyanye closed 2 years ago

kongyanye commented 3 years ago

Problem:

I'm using the meshnet-cni with flannel as base network. After installing flannel and meshnet-cni, I found the IP allocated to new pods would confilct with the IP of k8s system pods coredns. In each k8s cluster, there are two coredns pod, and their IP are allocated by flannel, usually with a suffix of 2 or 3. When starting to deploy new pods, the pod IP also are allocated from 2, which cause IP conficts and then the coredns pod would fail for liveness probe and restarted. At the same time, the user created pod also does not work properly. I'm not sure if this the problem due to CNI chainning. Please let me know if you know how to solve it. Thanks!

Reproduce the bug:

  1. Create a k8s cluster with kubeadm, pod cidr is 10.244.0.0/16
  2. Deploy the flannel cni
  3. Deploy the meshnet-cni
  4. Deploy new pods and check the IP, it would conflict with IP of coredns (not every time, but the IP is not excluded to be used).
networkop commented 3 years ago

meshnet-cni does not manage IP address assignment. Nor does flannel, afair. there should be an IPAM plugin configuration in the CNI configuration file which should specify the range of IPs for the pods. you should check to make sure each node has a unique non-overlapping range. for more details see this and this

kongyanye commented 3 years ago

In flannel’s config file subnet.env, there’s a variable to set FLANNEL_IPMASQ=true. I think this means to let flannel manage the IP allocation. Am I understanding that right?

Each node is indeed allocated a subnet of mask 255.255.255.0. So IP on different nodes won’t conflict. The problem is on the node which runs both coredns and custom pods. When they are on the same node there’s a chance that custom pod IP conflict with coredns. It seems the IPAM is not aware of IP address used by coredns.

networkop commented 3 years ago

I don't see how this can be possible. flannel or host-ipam would manage IP allocations for all pods of the system. unless something has corrupted the IPAM allocation DB, this should not happen. can you screenshot the output of kubectl get pod -A -owide and paste it here?

networkop commented 3 years ago

can you also paste the output of cat /etc/cni/net.d/* before you install meshnet-cni?

kongyanye commented 3 years ago

1. kubectl version

image

2. cni config

image image image image image

3. k8s

I have 4 nodes. The CIDR is as below: image

I run a lot of pods so the IP allocation increasing. And it finally start from 1. After deploy the pods: image

You can see the pod t48 is using the same address as coredns-54d67798b7-mnzrx on node net123 10.244.2.2. And you can see the restarts counts for coredns-54d67798b7-mnzrx increased from 2 to 3.

image

If you check the details of the pods, you can see the liveness and readiness probe is failed. Because the ip conflicts with other pods so it can't be ping.

Please let me know if you need further information. Thanks a lot!

kongyanye commented 3 years ago

I found a solution to the problem. Seems the IP address allocated before meshnet-cni installed is ill-managed. I just need to manually delete the coredns pod and then the IP address for both coredns and new custom pods are correctly set.

networkop commented 3 years ago

hm.. yeah, so it looks like the IPAM DB gets wiped out when meshnet is installed. I'll test it a bit over the weekend to see if I can reproduce it.

mhines01 commented 2 years ago

Isn't this the same issue as what was just closed?

mhines01 commented 2 years ago

https://github.com/networkop/meshnet-cni/issues/28

networkop commented 2 years ago

ha, yeah, it looks like it. thanks @mhines01 @kongyanye feel free to re-open if it's still an issue with the latest meshnet version.