patnordstrom / lke-vlan

Provides an example for how to register worker nodes to a VLAN for LKE
1 stars 0 forks source link

Node in cluster ended up duplicating an IP address #1

Open justmark opened 3 days ago

justmark commented 3 days ago

I followed your article on Medium - great writeup.

When I have applied the yaml against my cluster (there is 9 nodes), two of the nodes have ended up with the same IP address. I could see this by using arp-scan:

192.*.*.* *:*:*:*:*:* (Unknown) (DUP: 2)

Any thoughts on what the issue might be?

Thanks, Mark

patnordstrom commented 1 day ago

@justmark the script does attempt to provide a best effort approach to avoiding collisions based on these lines https://github.com/patnordstrom/lke-vlan/blob/main/prod/main.sh#L49-L52

There is a possibility, especially with a smaller CIDR block that two nodes coming online in a similar timeframe could generate the exact same IP address through the random selector in the code block above. I had thought about adding code to check for this collision and retry the IP selection process again if it found there was a duplicate as part of the generate_ip() function. If I get some time this week I can add something there as an additional check. One workaround would be to simply update your configuration profile on one of the LKE worker nodes that have a duplicate IP to remove the VLAN and then recycle the node. It should come back online and regenerate a new VLAN IP.

justmark commented 20 hours ago

@patnordstrom Thanks for the reply. I will recycle the node, as suggested.

Am I following this correctly? It appears that the individual pod is part of the VLAN rather than the node itself. If this is correct, does this mean that I need to use your example code in the /dev folder to build out an application that will join the VLAN, and then launch my application within the same pod?

Thanks, Mark

patnordstrom commented 19 hours ago

@justmark the deployment joins the worker nodes to the VLAN. The /dev folder is just used for local development of the script and is just for convenience. The only real difference is that it can pull in values from a configuration file instead of relying on the configuration objects in the Kubernetes cluster (e.g. the ConfigMap and Secret values). The main.sh script runs in a container as a DaemonSet so that it can register each node to the VLAN.

justmark commented 19 hours ago

@patnordstrom Ok. Will need to try and trace this down further then. My app that is running its own pods is seemingly trying to use the private ip address rather than the VLAN address that is picked up by the vlan-join-controller.

Mark

justmark commented 18 hours ago

@patnordstrom I connected to one of the vlan-controllers directly and issued ip addr show - I didn't see the VLAN listed here (just the internal IP). Clearly I am confused... I then ran kubectl describe node node-name and didn't see anything referencing the IP address for my VLAN. How/where can I see this level of detail?

Mark