siderolabs / omni

SaaS-simple deployment of Kubernetes - on your own hardware.
Other
402 stars 23 forks source link

[feature] provide a mechanism for reinstalling talos at a given version from maintenance mode #236

Closed rsmitty closed 2 weeks ago

rsmitty commented 1 month ago

Problem Description

This issue is mostly a stub for some much longer discussion. But we should brainstorm how we might offer the ability for a reinstall of talos at any given version, upgrade or downgrade, when the box is in maintenance mode. I'd mostly like to scope it there, as I don't think it makes sense to discuss in the context of the machine being part of a cluster. But if the machine has been de-allocated from a cluster, we should have some way to go back to an older version of Talos and thus be able to easily reuse the node in question without the need for a manual reset or pxe boot, etc.

Solution

No response

Alternative Solutions

No response

Notes

No response

Unix4ever commented 1 month ago

Provide a page for downgrading the machines in maintenance mode:

Downgrade rules should be specified as some struct in a go module like go-kubernetes one.

steverfrancis commented 3 weeks ago

This shouldn't be something a user has to think about, so we don't want to give a page that makes them downgrade.

Ideally we just want to support a pool of Talos machines, of any version supported by Omni. (So 1.4 to 1.8 currently)

You can add any machine, running any version of Talos, to any cluster running any version of Talos, or specify a cluster to be created with any version of Talos.

Omni will automatically downgrade or upgrade the Talos that is installed to the machine, so that it can join the cluster. So if the machine is running Talos off ISO, and has not been installed to disk, then it needs to install to disk, then do the upgrade/downgrade as necessary to join the cluster.

Unix4ever commented 3 weeks ago

Omni will automatically downgrade or upgrade the Talos that is installed to the machine, so that it can join the cluster. So if the machine is running Talos off ISO, and has not been installed to disk, then it needs to install to disk, then do the upgrade/downgrade as necessary to join the cluster.

I don't think installing ISO on some disk to do an upgrade is a viable solution: you won't be able to change the disk afterwards and we can't pick the disk automatically.

But the biggest blocker here is that you have to apply the machine config to even start the upgrade and if the config has new fields which are not supported in the older Talos version, the machine will never boot.

Maintenance upgrade, ISO upgrade story will be addressed in Talos 1.8. I will think about how we can address that in Omni only, but that's more of a Talos API issue, we should fix it there first