kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.2k stars 6.49k forks source link

Is there a wat to change cluster nodes IP from one interface to another #11699

Closed Viste closed 1 week ago

Viste commented 1 week ago

We're attempting to change the primary IPs of our cluster nodes to a new network interface but are encountering issues with etcd consistently failing. Steps We've Tried:

Modifying Cluster IP in cluster.yaml:
    We updated the IP for one of the control plane nodes in the inventory and re-ran cluster.yaml.
    Result: etcd fails.

Removing and Re-adding Node:
    Following the steps from the Adding/replacing a node section in the documentation:
        First, we removed one of the master nodes using remove-node.
        Then, we attempted to re-add it with the new IP via cluster.yaml and upgrade-cluster.
    Result: etcd still fails every time.

Question:

Is there an established way to change the primary IPs of cluster nodes from one interface to another without breaking etcd? Any guidance or workaround to achieve this would be greatly appreciated.

Thank you!

VannTen commented 1 week ago

Could you please use the bug report template ? There is not much information to go on here.

Viste commented 1 week ago

Could you please use the bug report template ? There is not much information to go on here.

Tell me what information you need, I'll provide it.

Here is my scenario:

I have two interfaces on my nodes:

eth1 (1Gbps)

eth2 (10Gbps)

The control plane IPs are currently assigned as follows:

eth1 (1Gbps):

10.10.20.11

10.10.20.12

10.10.20.13

we watn to use eth2 (10Gbps):

10.10.21.11

10.10.21.12

10.10.21.13

The issue appears when I try to update the inventory to replace the IPs on eth1 with the corresponding IPs on eth2, and then run either cluster.yml or upgrade-cluster.yml.

After making these changes, etcd fails to assemble as a cluster. Each time I try, a different node fails, and I receive errors such as:

Cluster ID mismatch

EOF when trying to get the version, preventing the node from joining the cluster.

If I only update the IP addresses for a single node (e.g., from 10.10.20.13 to 10.10.21.13), I still encounter the EOF error on etcd on this node.

When I attempted to remove and re-add a node as suggested in the documentation Adding/Replacing a Control Plane Node, the etcd node showed up as part of the etcd cluster but in a not started state. It consistently failed to start and returned EOF on version requests.

Additional context:

CNI: Cilium with kube-proxy replacement (strict mode)

It seems there might be other issues, but everything is currently failing at this stage. I would appreciate any guidance or suggestions on how to proceed.

anyway thank you!

VannTen commented 1 week ago

Tell me what information you need, I'll provide it.

Could you please use the bug report template ?

Best guess is that etcd treats ip as node identity, so you'd need to remove the node and re-add it with another IP.

I'm going to close this because the issue tracker is not suited for support.

Feel free to open a bug report using the bug report template.

/close /kind support

k8s-ci-robot commented 1 week ago

@VannTen: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/11699#issuecomment-2466696117): > >>Tell me what information you need, I'll provide it. > > >> Could you please use the bug report template ? > > >Best guess is that etcd treats ip as node identity, so you'd need to remove the node and re-add it with another IP. > >I'm going to close this because the issue tracker is not suited for support. > >Feel free to open a bug report using the bug report template. > >/close >/kind support > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Viste commented 1 week ago

Tell me what information you need, I'll provide it. Could you please use the bug report template ? Best guess is that etcd treats ip as node identity, so you'd need to remove the node and re-add it with another IP. I'm going to close this because the issue tracker is not suited for support. Feel free to open a bug report using the bug report template. /close /kind support

when i do this etcd fails with unhealthy node as i say