rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
263 stars 228 forks source link

No way to configure vmAffinity for Harvester clusters #1009

Closed DillonN closed 1 year ago

DillonN commented 2 years ago

Maybe I'm missing the config somewhere, but I've looked all over and can't find a way to configure node scheduling rules for Harvester deployments. Here's a screenshot from the UI of what I'm trying to configure:

image

I was expecting to find a spot to configure this in rancher2_machine_config_v2, since it seems this setting gets applied to the property vmAffinity for the resulting HarvesterConfig. E.g. value:

vmAffinity: eyJub2RlQWZmaW5pdHkiOnsicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6eyJub2RlU2VsZWN0b3JUZXJtcyI6W3sibWF0Y2hFeHByZXNzaW9ucyI6W3sia2V5IjoidG9wb2xvZ3kua3ViZXJuZXRlcy5pby96b25lIiwib3BlcmF0b3IiOiJJbiIsInZhbHVlcyI6WyJ5eXoyIl19XX1dfX19

which decodes to

{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["yyz2"]}]}]}}}

This is important so I can ensure nodes deploy across availability zones. Right now, they're all going into one. Thanks for any help!

SURE-5854

a-blender commented 1 year ago

I confirmed vmAffinity options are the same in the ui for rancher 2.6.11-rc2.

image

image

So backend support can be added to the Terraform rancher2 provider. It's not listed in the registry docs so will have to be added to the docs.

The Terraform provider has also recently been branched into master and release/v2 branches that align with Rancher minor versions 2.7 and 2.6. Backend support for node affinity will have to be added to both branches so this PR https://github.com/rancher/terraform-provider-rancher2/pull/1024 will need a backport to release/v2.

a-blender commented 1 year ago

/backport v2.6.x release/v2

a-blender commented 1 year ago

Testing template

Root cause

No Terraform support for node affinity on harvester clusters.

What was fixed, or what changes have occurred

Add new field rancher2_cluster_v2.harvester_config.vm_affinity so that Terraform supports VM affinity for harvester clusters via the rancher backend.

Areas or cases that should be tested

This is a community PR and I actually don't personally have Harvester credentials but this is my best guess as to how to test this.

Test steps

main.tf

``` # Create a new rancher2 machine config v2 using harvester node_driver resource "rancher2_machine_config_v2" "foo-harvester-v2" { generate_name = "foo-harvester-v2" harvester_config { vm_namespace = "default" cpu_count = "2" memory_size = "4" disk_size = "40" network_name = "harvester-public/vlan1" image_name = "harvester-public/image-57hzg" ssh_user = "ubuntu" vm_affinity = "{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["yyz2"]}]}]}}}" } } resource "rancher2_cluster_v2" "foo-harvester-v2" { name = "foo-harvester-v2" rke_config { machine_pools { name = "pool1" cloud_credential_secret_name = rancher2_cloud_credential.foo-harvester.id control_plane_role = true etcd_role = true worker_role = true quantity = 1 machine_config { kind = rancher2_machine_config_v2.foo-harvester-v2.kind name = rancher2_machine_config_v2.foo-harvester-v2.name } } machine_selector_config { config = { cloud-provider-name = "" } } machine_global_config = <

What areas could experience regressions ?

Terraform rancher2 provider, Harvester v2 prov

Are the repro steps accurate/minimal ?

Yes.

a-blender commented 1 year ago

Blocked -- waiting on Terraform 3.0.0 for Rancher v2.7.x.

snasovich commented 1 year ago

@annablender , we should be able to test before the release, so I don't think it's really "Blocked". Please correct me if I'm missing something.

a-blender commented 1 year ago

@snasovich This has been tested on Rancher 2.6.9 https://github.com/rancher/terraform-provider-rancher2/pull/1024#issuecomment-1415967940 in the community PR but otherwise has not been verified with Rancher backend yet.

I believe QA needs to test this provider update using a released version of Terraform that exists on the registry. In this case, it would be Terraform 3.0.0 for Rancher v2.7.x, which will be released a few days after the Rancher v2.7.x release so QA is blocked until then.

snasovich commented 1 year ago

Just to close the loop on the above, per offline discussions we're looking to cut RCs for TF providers to enable testing by QA.

a-blender commented 1 year ago

@sowmyav27 This is ready to test using Terraform rancher2 v3.0.0-rc1. Please setup local testing on the rc version of the provider with this command

./setup-provider.sh rancher2 3.0.0-rc1
azimin-ex42 commented 1 year ago

Hello! How do you set labels for Harvester virtual machines to use VMAffinity?

a-blender commented 1 year ago

This was found to be broken on Rancher 2.6.11. I don't think the implemented changes are passing the values correctly to Rancher. Needs to be debugged and fixed.

a-blender commented 1 year ago

@irishgordo Moving this discussion here to debug TF vmAffinity

I noticed while investigating that you said here you were trying to pass this JSON blob to Rancher, encoded as base64 in your configuration file.

{
    "nodeAffinity": {
        "requiredDuringSchedulingIgnoredDuringExecution": {
            "nodeSelectorTerms": [
                {
                    "matchExpressions": [
                        {
                            "key": "topology.kubernetes.io/zone",
                            "operator": "In",
                            "values": [
                                "us-fremont-1a"
                            ]
                        }
                    ]
                }
            ]
        }
    }
}

But when I converted to base64, I got this

ewogICAgIm5vZGVBZmZpbml0eSI6IHsKICAgICAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgICAgICAgICAgewogICAgICAgICAgICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgICAgICAgICAgICAgICAia2V5IjogInRvcG9sb2d5Lmt1YmVybmV0ZXMuaW8vem9uZSIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgInZhbHVlcyI6IFsKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAidXMtZnJlbW9udC0xYSIKICAgICAgICAgICAgICAgICAgICAgICAgICAgIF0KICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgIF0KICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgXQogICAgICAgIH0KICAgIH0KfQ==

Which is not what you had in your config file. Could this be the culprit?

irishgordo commented 1 year ago

@a-blender - that's a great call out, I actually had taken this:

{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["us-fremont-1a"]}]}]}}}

And pretty-printed that to via JSON beautify to:

{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "topology.kubernetes.io/zone",
              "operator": "In",
              "values": [
                "us-fremont-1a"
              ]
            }
          ]
        }
      ]
    }
  }
}

Just for easier reading - and wasn't actually using that for the vm_affinity property on the terraform resource.

But in the:

    vm_affinity = "eyJub2RlQWZmaW5pdHkiOnsicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6eyJub2RlU2VsZWN0b3JUZXJtcyI6W3sibWF0Y2hFeHByZXNzaW9ucyI6W3sia2V5IjoidG9wb2xvZ3kua3ViZXJuZXRlcy5pby96b25lIiwib3BlcmF0b3IiOiJJbiIsInZhbHVlcyI6WyJ1cy1mcmVtb250LTFhIl19XX1dfX19"

I was using the Base64 Encoded version of:

{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["us-fremont-1a"]}]}]}}}

Not the pretty-printed JSON one.

Screenshot from 2023-04-13 18-07-55

snasovich commented 1 year ago

This issue is being discussed offline with @irishgordo and @futuretea - so far it looks like TFP changes are actually good and there may be an issue in Harvester itself.

futuretea commented 1 year ago

Test Plan

  1. Setup a two node Harvester v1.1-head cluster, refer to https://docs.harvesterhci.io/v1.1/install/iso-install
  2. Add a cloud image a a vlan network to the Harvester by using the following terraform config:
    
    terraform {
    required_version = ">= 0.13"
    required_providers {
    harvester = {
      source  = "harvester/harvester"
      version = "0.6.1"
    }
    }
    }

provider "harvester" { kubeconfig = "" }

resource "harvester_image" "focal-server" { name = "focal-server" namespace = "harvester-public"

display_name = "focal-server-cloudimg-amd64.img" source_type = "download" url = "https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img" }

data "harvester_clusternetwork" "mgmt" { name = "mgmt" }

resource "harvester_network" "mgmt-vlan1" { name = "mgmt-vlan1" namespace = "harvester-public"

vlan_id = 1

route_mode = "auto" route_dhcp_server_ip = ""

cluster_network_name = data.harvester_clusternetwork.mgmt.name }

```bash
terraform init
terraform apply
  1. Setup a Rancher v2.7.2/v2.7-head cluster

  2. Import Harvester cluster to the Rancher cluster in Virtualization Management use cluster name foo-harvester 1676562316085

  3. install the v3.0.0-rc2 rancher2 provider

    wget https://raw.githubusercontent.com/rancher/terraform-provider-rancher2/master/setup-provider.sh
    chmod +x setup-provider.sh
    ./setup-provider.sh rancher2 v3.0.0-rc2
  4. Use the following test config

    
    terraform {
    required_providers {
    rancher2 = {
      source = "terraform.local/local/rancher2"
      version = "3.0.0-rc2"
    }
    }
    }

provider "rancher2" { api_url = "<>" access_key = "<>" secret_key = "<>" insecure = true }

data "rancher2_cluster_v2" "foo-harvester" { name = "foo-harvester" }

Create a new Cloud Credential for an imported Harvester cluster

resource "rancher2_cloud_credential" "foo-harvester" { name = "foo-harvester" harvester_credential_config { cluster_id = data.rancher2_cluster_v2.foo-harvester.cluster_v1_id cluster_type = "imported" kubeconfig_content = data.rancher2_cluster_v2.foo-harvester.kube_config } }

Create a new rancher2 machine config v2 using harvester node_driver

resource "rancher2_machine_config_v2" "foo-harvester-v2" { generate_name = "foo-harvester-v2" harvester_config { vm_namespace = "default" cpu_count = "2" memory_size = "4" vm_affinity = "ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJub2RlLXJvbGUua3ViZXJuZXRlcy5pby9jb250cm9sLXBsYW5lIiwKICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICJ2YWx1ZXMiOiBbCiAgICAgICAgICAgICAgICAidHJ1ZSIKICAgICAgICAgICAgICBdCiAgICAgICAgICAgIH0KICAgICAgICAgIF0KICAgICAgICB9CiAgICAgIF0KICAgIH0KICB9Cn0=" disk_info = <<EOF { "disks": [{ "imageName": "harvester-public/focal-server", "size": 40, "bootOrder": 1 }] } EOF network_info = <<EOF { "interfaces": [{ "networkName": "harvester-public/mgmt-vlan1" }] } EOF ssh_user = "ubuntu" user_data = "I2Nsb3VkLWNvbmZpZwpwYWNrYWdlX3VwZGF0ZTogdHJ1ZQpwYWNrYWdlczoKICAtIHFlbXUtZ3Vlc3QtYWdlbnQKICAtIGlwdGFibGVzCnJ1bmNtZDoKICAtIC0gc3lzdGVtY3RsCiAgICAtIGVuYWJsZQogICAgLSAnLS1ub3cnCiAgICAtIHFlbXUtZ3Vlc3QtYWdlbnQuc2VydmljZQo=" } }

resource "rancher2_cluster_v2" "foo-harvester-v2" { name = "foo-harvester-v2" kubernetes_version = "v1.24.11+rke2r1" rke_config { machine_pools { name = "pool1" cloud_credential_secret_name = rancher2_cloud_credential.foo-harvester.id control_plane_role = true etcd_role = true worker_role = true quantity = 1 machine_config { kind = rancher2_machine_config_v2.foo-harvester-v2.kind name = rancher2_machine_config_v2.foo-harvester-v2.name } } machine_selector_config { config = { cloud-provider-name = "" } } machine_global_config = <<EOF cni: "calico" disable-kube-proxy: false etcd-expose-metrics: false EOF upgrade_strategy { control_plane_concurrency = "10%" worker_concurrency = "10%" } etcd { snapshot_schedule_cron = "0 /5 " snapshot_retention = 5 } chart_values = "" } }

```bash
terraform init
terraform apply

When I apply for the first time, such an error will occur, but it is OK to apply again. Is there anything wrong in my configuration file? Or is it a known problem?

rancher2_cloud_credential.foo-harvester: Creation complete after 2s [id=cattle-global-data:cc-rqgkh]
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for rancher2_cluster_v2.foo-harvester-v2 to include new values learned so far during apply,
│ provider "registry.terraform.io/rancher/rancher2" produced an invalid new value for
│ .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now
│ cty.StringVal("cattle-global-data:cc-rqgkh").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
  1. clean resources
    terraform destroy

the source string of base64 encoded string

ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJub2RlLXJvbGUua3ViZXJuZXRlcy5pby9jb250cm9sLXBsYW5lIiwKICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICJ2YWx1ZXMiOiBbCiAgICAgICAgICAgICAgICAidHJ1ZSIKICAgICAgICAgICAgICBdCiAgICAgICAgICAgIH0KICAgICAgICAgIF0KICAgICAgICB9CiAgICAgIF0KICAgIH0KICB9Cn0=

is

{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "node-role.kubernetes.io/control-plane",
              "operator": "In",
              "values": [
                "true"
              ]
            }
          ]
        }
      ]
    }
  }
}

I found that configuring vm_affinity with a json string would cause the ui to stuck when Edit Config @lanfon72

lanfon72 commented 1 year ago

Tested vm_affinity could be applied correctly with terraform-provider-rancher2 v3.0.0-rc2 and Harvester version v1.1-cff1d5b5-head which includes the fix:https://github.com/harvester/harvester/issues/3816

image

irishgordo commented 1 year ago

@futuretea thank you for providing the testing steps and highlighting the tf resources needing to be created :smile: :+1:

Re-tested with Harvester v1.1.1, Rancher v2.7.2, Rancher2 Terraform v3.0.0-rc2 and was successful.

Validated that I was able to provision an RKE2 cluster that had the rancher2_machine_config_v2 's vm_affinity as either:

And with Harvester v1.1.1, Rancher v2.7.2, & Rancher2 TFProvider v3.0.0-rc2 it looked good :+1: : Screenshot from 2023-04-25 12-39-12 Screenshot from 2023-04-25 12-35-37 Screenshot from 2023-04-25 12-24-00

a-blender commented 1 year ago

@futuretea @irishgordo Thank you for verifying this on the harvester end! vmAffinity in TF appears to be working correctly as it was an issue with formatting values.

If this is done, can you please close it out?

irishgordo commented 1 year ago

Tested with Harvester v1.1.2, Rancher v2.7.2, Rancher2 TFProvider v3.0.0-rc2.

Was able to provision an RKE2 cluster w/ both a:

Screenshot from 2023-04-25 15-15-30 Screenshot from 2023-04-25 15-18-34 Screenshot from 2023-04-25 15-28-26

But as mentioned in https://github.com/harvester/harvester/issues/3820 , it seems that if the user had:

locals {
  vm_affinity_to_use = <<EOF
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchExpressions": [
            {
              "key": "topology.kubernetes.io/zone",
              "operator": "In",
              "values": [
                "us-fremont-1a"
              ]
            },
            {
              "key": "network.harvesterhci.io/mgmt",
              "operator": "In",
              "values": [
                "true"
              ]
            }
          ]
        }
      ]
    }
  }
}
EOF
}

# Create a new rancher2 machine config v2 using harvester node_driver
resource "rancher2_machine_config_v2" "foo-harvester-v2-cloud-provider" {
  generate_name = "foo-harvester-v2-cloud-provider"
  harvester_config {
    vm_namespace = "default"
    cpu_count = "4"
    memory_size = "8"
    disk_info = <<EOF
        {
        "disks": [{
            "imageName": "default/image-666vl",
            "size": 40,
            "bootOrder": 1
        }]
    }
    EOF
    network_info = <<EOF
    {
      "interfaces": [{
          "networkName": "default/mgmt-1"
      }]
    }
    EOF
    ssh_user = "opensuse"
    #vm_affinity = "ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJ0b3BvbG9neS5rdWJlcm5ldGVzLmlvL3pvbmUiLAogICAgICAgICAgICAgICJvcGVyYXRvciI6ICJJbiIsCiAgICAgICAgICAgICAgInZhbHVlcyI6IFsKICAgICAgICAgICAgICAgICJ1cy1mcmVtb250LTFhIgogICAgICAgICAgICAgIF0KICAgICAgICAgICAgfSwKICAgICAgICAgICAgewogICAgICAgICAgICAgICJrZXkiOiAibmV0d29yay5oYXJ2ZXN0ZXJoY2kuaW8vbWdtdCIsCiAgICAgICAgICAgICAgIm9wZXJhdG9yIjogIkluIiwKICAgICAgICAgICAgICAidmFsdWVzIjogWwogICAgICAgICAgICAgICAgInRydWUiCiAgICAgICAgICAgICAgXQogICAgICAgICAgICB9CiAgICAgICAgICBdCiAgICAgICAgfQogICAgICBdCiAgICB9CiAgfQp9"
    vm_affinity = local.vm_affinity_to_use
  }
}

With the JSON based local.var TF for vm_affinity (instead of base64 encoded), it will not allow the user to "edit" the config of the RKE2 Cluster in Cluster Management on Rancher v2.7.2. ( see screenshot that shows "Loading" on v2.7.2 )

@futuretea , is there possibly a workaround for- https://github.com/harvester/harvester/issues/3820 - if so, perhaps we could close this out?

futuretea commented 1 year ago

@irishgordo workaround: https://github.com/rancher/terraform-provider-rancher2/pull/1110

irishgordo commented 1 year ago

Awesome, thanks for that @futuretea :smile: :+1: - since it's noted in the docs about base64 only - I'll go ahead and close this out :smile: