siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

Got panic on changing custer NodeIP Addresses on multi-master cluster #9075

Closed iLemonra1n closed 3 weeks ago

iLemonra1n commented 1 month ago

Hello, I got panic on changing custer NodeIP Addresses on multi-master cluster (e.g. 3 master k8s cluster)

For some network reasons, i need to install cluster with Network A (provisioning network, with master1 192.168.1.101/24, master2 192.168.1.102/24, and master3 192.168.1.103/24), and deploy workloads on it.

After finishing works, i need to move the cluster to another Network B (production network, with master1 192.168.10.1/24, master2 192.168.10.2/24, master3 192.168.10.3/24), when I use "talosctl edit machineconfig" to directly change the machine IP, it seems modify to network is success, but suddenly etcd goes down (etcd is still connecting old IPs), through serial console output and "talosctl logs etcd", I found etcd is still trying to connect to old IPs (e.g. master3 new ip is 192.168.10.3, but still trying to peer with master1 old ip 192.168.1.101 and master2 old ip 192.168.1.102), and reboot cannot resolve the problem. Network A and Network B is not physically connected, nor can connect via Layer2/Layer3 routes.

So is there a SAFE way to change cluster network IPs without reinstalling the cluster or format EPHERMAL partition?

Slack Message

smira commented 1 month ago

Your message lacks detailed information - what kind of "panic" you got?

You can't change the etcd member IPs all at once, as list of members (which records peer IPs) itself is part of etcd database, so it requires quorum nodes at any time to update members.

You still have options:

iLemonra1n commented 3 weeks ago

Sorry for late reply.

Finished successfully with these steps:

smira commented 3 weeks ago

Yes, this is the way it should be done with etcd as I explained above.