kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.19k stars 6.48k forks source link

terraform bug inverts boolean logic when extra_groups are defined #10763

Closed rptaylor closed 10 months ago

rptaylor commented 10 months ago

I have

k8s_nodes = {
  b01 = {
    "az" = "Compute"
    "flavor" = "64a48dcb-1469-469d-b0f2-d00f6ca43707"
    "floating_ip" = false
  },
  b02 = {
    "az" = "Compute" 
    "flavor" = "64a48dcb-1469-469d-b0f2-d00f6ca43707"
    "floating_ip" = false
  },
  v01 = { # vGPU 
    "az" = "nova"
    "flavor" = "39d3041a-ace2-4166-9133-d78fe00190a9"
    "floating_ip" = false
    "extra_groups" = "gpu_node"  # New line recently added
  } 
}

This worked fine until I recently added the "extra_groups" = "gpu_node" line (feature from https://github.com/kubernetes-sigs/kubespray/pull/9211/files). This should just add "gpu_node" to kubespray_groups of one node. However when I terraform apply, it tries to remove the "no_floating" group from all nodes, even though floating_ip is still false!

Terraform will perform the following actions:

  # module.compute.openstack_compute_instance_v2.k8s_nodes["b01"] will be updated in-place
  ~ resource "openstack_compute_instance_v2" "k8s_nodes" {
        id                  = "c8d647bb-565a-44b4-ae78-bd6b0d5affb9"
      ~ metadata            = {
          ~ "kubespray_groups" = "kube_node,k8s_cluster,no_floating," -> "kube_node,k8s_cluster,,"
            # (3 unchanged elements hidden)
        }
    }
  # module.compute.openstack_compute_instance_v2.k8s_nodes["b02"] will be updated in-place
  ~ resource "openstack_compute_instance_v2" "k8s_nodes" {
        id                  = "037dac0d-a137-42f6-bd35-ef97cba0f5e3"
      ~ metadata            = {
          ~ "kubespray_groups" = "kube_node,k8s_cluster,no_floating," -> "kube_node,k8s_cluster,,"
            # (3 unchanged elements hidden)
        }
    }
  # module.compute.openstack_compute_instance_v2.k8s_nodes["v01"] will be updated in-place
  ~ resource "openstack_compute_instance_v2" "k8s_nodes" {
        id                  = "254f01e0-657d-483c-9197-30b4e8382822"
      ~ metadata            = {
          ~ "kubespray_groups" = "kube_node,k8s_cluster,no_floating," -> "kube_node,k8s_cluster,,gpu_node"
            # (3 unchanged elements hidden)
        }
    }
Plan: 0 to add, 3 to change, 0 to destroy.

I've spent hours trying to reproduce this potential Terraform bug in a simpler/independent environment than Kubespray but have not been able to :( I tried Terraform versions 1.3.9, 1.4.7 and 1.6.6. I tried .terraform/providers/registry.terraform.io/terraform-provider-openstack/openstack/1.48.0 and also did terraform init -upgrade , upgrading the openstack provider version to 1.53.0. I tried creating openstack compute instances in a Terraform module using for_each and the same variable definitions and metadata definition

  metadata = {
    kubespray_groups = "kube_node,k8s_cluster,%{if each.value.floating_ip == false}no_floating,%{endif}${var.supplementary_node_groups},${try(each.value.extra_groups, "")}"
  }

but still could not reproduce it. There must be some extra complications in the Kubespray Terraform environment triggering this issue.

I did however try modifying the code to

    kubespray_groups = "kube_node,k8s_cluster,%{if each.value.floating_ip == false}no_floating,%{else}FLOATING,%{endif}${var.supplementary_node_groups},${try(each.value.extra_groups, "")}"

which resulted in "FLOATING" being added, confirming that if each.value.floating_ip == false was evaluating to the opposite of what it should be!

I also tried removing the "extra_groups" = "gpu_node" line and confirmed that this causes the bug to go away again, and the no_floating group comes back.

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): 7b936bc80

rptaylor commented 10 months ago

related to https://github.com/kubernetes-sigs/kubespray/pull/9211

rptaylor commented 10 months ago

The output of the terraform provider upgrade, after which the bug was still present:

$ terraform init -upgrade

Initializing the backend...
Upgrading modules...
- compute in modules/compute
- ips in modules/ips
- network in modules/network

Initializing provider plugins...
- Finding latest version of hashicorp/template...
- Finding latest version of hashicorp/null...
- Finding terraform-provider-openstack/openstack versions matching "~> 1.17"...
- Using previously-installed hashicorp/template v2.2.0
- Installing hashicorp/null v3.2.2...
- Installed hashicorp/null v3.2.2 (signed by HashiCorp)
- Installing terraform-provider-openstack/openstack v1.53.0...
- Installed terraform-provider-openstack/openstack v1.53.0 (self-signed, key ID 4F80527A391BEFD2)