Closed mwheckmann closed 1 month ago
This is not what is happening here for sure, but rather pretty compilcated set of interactions.
So the scenario like "default hasn't changed" isn't even possible (invalid config would never make its way into 1.6.7).
The only way this can happen is the following:
Long story short is that we probably need to prevent automatic revert with Omni as it only works well with more manual operations.
Bug Report
NOTE: we experienced this on Azure but it could happen on other Clouds or even metal servers.
Talos Grub default not changed after upgrade.
Description
A Talos Azure VM upgraded from 1.6.7 to 1.7.5 boots into the previous version grub entry (1.6.7) after reboot initiated by cluster destroy. This causes the error below in the logs and the VM is stuck.:
Logs
mlx5_core 91ec:00:02.0 eth1: Link up SUBSYSTEM=pci DEVICE=+pci:91ec:00:02.0 hv_netvsc 002248b1-7fdb-0022-48b1-7fdb002248b1 eth0: Data path switched to VF: eth1 SUBSYSTEM=vmbus DEVICE=+vmbus:002248b1-7fdb-0022-48b1-7fdb002248b1 hv_netvsc 002248b1-7fdb-0022-48b1-7fdb002248b1 eth0: Data path switched from VF: eth1 SUBSYSTEM=vmbus DEVICE=+vmbus:002248b1-7fdb-0022-48b1-7fdb002248b1 [talos] controller failed {"component": "controller-runtime", "controller": "network.LinkSpecController", "error": "1 error occurred:\n\t* error enslaving/unslaving link \"eth1\" under \"\": netlink receive: operation not supported\n\n"} [talos] controller failed {"component": "controller-runtime", "controller": "config.AcquireController", "error": "failed to load config from STATE: unknown keys found during decoding:\nmachine:\n features:\n hostDNS:\n enabled: true # Enable host DNS caching resolver.\n"}
Also see:
Environment