siderolabs / terraform-provider-talos

Mozilla Public License 2.0
117 stars 15 forks source link

`talos_version` is not respected #157

Closed luis-guimaraes-exoawk closed 5 months ago

luis-guimaraes-exoawk commented 5 months ago

When I create my cluster I used the Talos metal image for version 1.6.7 and passed the same version to the Talos resources that support it:

But after the configuration and installation goes through, the cluster is downgraded to version 1.6.0 which seems to be the Talos version packaged in the version of the provider I am using (0.4.0).

This seems like a bug to me and I think it could be beneficial to either scrap the talos_version for those args and replace it with a provider wide variable or add the talos_version arg to all the resources and datasources.

smira commented 5 months ago

talos_version is the machine configuration compatibility setting, it's not the version of Talos to be installed.

It's supposed to be set once at the moment of the cluster creation (e.g. v1.6) and never changed through the lifetime of the cluster (even if it gets upgraded over time).

luis-guimaraes-exoawk commented 5 months ago

@smira How do we configure the version of Talos to be installed then?

smira commented 5 months ago

Follow general guides on installing Talos: https://www.talos.dev/v1.6/introduction/getting-started/

It depends whether you're booting from an ISO/PXE, disk image etc.

https://github.com/siderolabs/contrib has examples

luis-guimaraes-exoawk commented 5 months ago

Could talos_machine_configuration datasource use the talos_version in a different way? Or add another flag that works like the kubernetes_version argument?

I understand that talos_version is the version used to generate the config, but the fact that kubernetes_version also exists makes it a little misleading as it leads to behaviour like mine, where the image deployed was for version 1.6.7 and the config was also generated for 1.6.7 but after the configuration apply Talos gets downgraded to 1.6.0.

Maybe we could have something like the following:

data "talos_machine_configuration" "controlplane" {
  kubernetes_version     = var.kubernetes_version
  talos_version          = var.talos_version
  talos_contract_version = var.talos_version

  cluster_name     = var.cluster_name
  cluster_endpoint = var.cluster_endpoint
  machine_type     = "controlplane"
  machine_secrets  = talos_machine_secrets.this.machine_secrets
}

This would make it more clear in my view. What do you think @smira ?

smira commented 5 months ago

It is backwards-incompatible, it still won't work for disk images, and many small issues like that.

Contract should never change (your example is bad in that sense).

It might be that the argument needs to be renamed, and documentation has to improve, but I don't see an immediate solution right now.

luis-guimaraes-exoawk commented 5 months ago

Maybe the contract isn't being respected everywhere then? If I generate the configuration with talos_version set to 1.6.7 it should generate with the installer set to 1.6.7 and not 1.6.0, no?

machine:
  install:
    image: ghcr.io/siderolabs/installer:1.6.0
smira commented 5 months ago

Contract is not even about the installer version at all, and it's fine to do contract v1.6.

Contract can't enforce installer version, as it would break future cluster upgrades (as contract should never be changed).

This section might change in future versions of Talos, and it gets ignored for disk images right now.

You can do a config patch to set the version of the installer you need.