Closed junaru closed 1 year ago
The IPs shouldn't matter that much, as it should be the INTERNAL IP
from kubectl get nodes -o wide
.
What happened most probably, and that's what upgrade-k8s
tried to warn you about, is that you still have Pod Security Policy enabled.
See https://www.talos.dev/v1.2/talos-guides/upgrading-talos/#podsecuritypolicy-removal, you just need to flip the value, and kube-apiserver should start back.
you just need to flip the value, and kube-apiserver should start back.
It worked! You just made my day!
Assumed "This setting defaulted to true since Talos v1.0.0 release." meant the default value was applied even if the key was undefined so didn't even try toggling that.
Please feel free to delete this issue as it contains a wall of misinformation at this point.
Thank you again and have a great weekend!
Hello,
Recent versions of
talosctl upgrade-k8s
may be choosing wrong IP for nodes if custom wireguard interfaces are present.Ive been upgrading cluster from v0.14 to latest stable via upgrade paths detailed in https://www.talos.dev/v1.3/talos-guides/upgrading-talos/#supported-upgrade-paths
The cluster consists of four nodes - two masters and two worker nodes with kubespan (probabbly irrelevant) and network configuration where they get public IPv4 via DHCP on eth0 and a staticly defined wireguard interface named storage0 for communication with storage network.
One of control plane nodes network config:
The upgrade path taken (retyped, with talosctl version up front):
And finally the call that broke k8s:
Notice the IPs in above output, they are all wireguard IPs that are bound to storage0 interfaces on all the nodes. Guessing what happened is talosctl misidentified them as public ones and pushed them somewhere deeper into the stack.
After this i immediately lost access to control plane. All nodes are still accessible via talosctl (can query machineconfig, reboot, etc), but anything involving kubectl fails.
Just before the
upgrade-k8s --to 1.25.5
cluster was functioning normally:A dry run was also completed successfully just before the upgrade and the IPs reported were wireguard ones from storage0. In previous updates public IPv4 addresses were present in the output like the one below.
Not sure if it's wireguard related but its the only part from
upgrade-k8s
output that looks weird.Theres also the fact i'm running two control plane nodes so split brain issues could also be a contributing factor.
Is there any troubleshooting steps/guides i could take to try and recover this? The cluster is pretty much a home test lab but its been running for a year and its purpose was exactly to see how k8s/talos could break and see how we can recover from that.
Any insight would be highly appreciated, thank you!
node_boot_log.txt