Closed rsmitty closed 2 weeks ago
Provide a page for downgrading the machines in maintenance mode:
Downgrade rules should be specified as some struct in a go module like go-kubernetes
one.
This shouldn't be something a user has to think about, so we don't want to give a page that makes them downgrade.
Ideally we just want to support a pool of Talos machines, of any version supported by Omni. (So 1.4 to 1.8 currently)
You can add any machine, running any version of Talos, to any cluster running any version of Talos, or specify a cluster to be created with any version of Talos.
Omni will automatically downgrade or upgrade the Talos that is installed to the machine, so that it can join the cluster. So if the machine is running Talos off ISO, and has not been installed to disk, then it needs to install to disk, then do the upgrade/downgrade as necessary to join the cluster.
Omni will automatically downgrade or upgrade the Talos that is installed to the machine, so that it can join the cluster. So if the machine is running Talos off ISO, and has not been installed to disk, then it needs to install to disk, then do the upgrade/downgrade as necessary to join the cluster.
I don't think installing ISO on some disk to do an upgrade is a viable solution: you won't be able to change the disk afterwards and we can't pick the disk automatically.
But the biggest blocker here is that you have to apply the machine config to even start the upgrade and if the config has new fields which are not supported in the older Talos version, the machine will never boot.
Maintenance upgrade, ISO upgrade story will be addressed in Talos 1.8. I will think about how we can address that in Omni only, but that's more of a Talos API issue, we should fix it there first
Problem Description
This issue is mostly a stub for some much longer discussion. But we should brainstorm how we might offer the ability for a reinstall of talos at any given version, upgrade or downgrade, when the box is in maintenance mode. I'd mostly like to scope it there, as I don't think it makes sense to discuss in the context of the machine being part of a cluster. But if the machine has been de-allocated from a cluster, we should have some way to go back to an older version of Talos and thus be able to easily reuse the node in question without the need for a manual reset or pxe boot, etc.
Solution
No response
Alternative Solutions
No response
Notes
No response