Closed DillonN closed 1 year ago
I confirmed vmAffinity options are the same in the ui for rancher 2.6.11-rc2
.
So backend support can be added to the Terraform rancher2 provider. It's not listed in the registry docs so will have to be added to the docs.
The Terraform provider has also recently been branched into master
and release/v2
branches that align with Rancher minor versions 2.7 and 2.6. Backend support for node affinity will have to be added to both branches so this PR https://github.com/rancher/terraform-provider-rancher2/pull/1024 will need a backport to release/v2
.
/backport v2.6.x release/v2
No Terraform support for node affinity on harvester clusters.
Add new field rancher2_cluster_v2.harvester_config.vm_affinity
so that Terraform supports VM affinity for harvester clusters via the rancher backend.
This is a community PR and I actually don't personally have Harvester credentials but this is my best guess as to how to test this.
Test steps
v2.7-head
vm_affinity
set in your Terraform main.tf (examples here https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster_v2)
```
# Create a new rancher2 machine config v2 using harvester node_driver
resource "rancher2_machine_config_v2" "foo-harvester-v2" {
generate_name = "foo-harvester-v2"
harvester_config {
vm_namespace = "default"
cpu_count = "2"
memory_size = "4"
disk_size = "40"
network_name = "harvester-public/vlan1"
image_name = "harvester-public/image-57hzg"
ssh_user = "ubuntu"
vm_affinity = "{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["yyz2"]}]}]}}}"
}
}
resource "rancher2_cluster_v2" "foo-harvester-v2" {
name = "foo-harvester-v2"
rke_config {
machine_pools {
name = "pool1"
cloud_credential_secret_name = rancher2_cloud_credential.foo-harvester.id
control_plane_role = true
etcd_role = true
worker_role = true
quantity = 1
machine_config {
kind = rancher2_machine_config_v2.foo-harvester-v2.kind
name = rancher2_machine_config_v2.foo-harvester-v2.name
}
}
machine_selector_config {
config = {
cloud-provider-name = ""
}
}
machine_global_config = <
Terraform rancher2 provider, Harvester v2 prov
Yes.
Blocked -- waiting on Terraform 3.0.0 for Rancher v2.7.x.
@annablender , we should be able to test before the release, so I don't think it's really "Blocked". Please correct me if I'm missing something.
@snasovich This has been tested on Rancher 2.6.9 https://github.com/rancher/terraform-provider-rancher2/pull/1024#issuecomment-1415967940 in the community PR but otherwise has not been verified with Rancher backend yet.
I believe QA needs to test this provider update using a released version of Terraform that exists on the registry. In this case, it would be Terraform 3.0.0 for Rancher v2.7.x, which will be released a few days after the Rancher v2.7.x release so QA is blocked until then.
Just to close the loop on the above, per offline discussions we're looking to cut RCs for TF providers to enable testing by QA.
@sowmyav27 This is ready to test using Terraform rancher2 v3.0.0-rc1. Please setup local testing on the rc version of the provider with this command
./setup-provider.sh rancher2 3.0.0-rc1
Hello! How do you set labels for Harvester virtual machines to use VMAffinity?
This was found to be broken on Rancher 2.6.11. I don't think the implemented changes are passing the values correctly to Rancher. Needs to be debugged and fixed.
@irishgordo Moving this discussion here to debug TF vmAffinity
I noticed while investigating that you said here you were trying to pass this JSON blob to Rancher, encoded as base64 in your configuration file.
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "topology.kubernetes.io/zone",
"operator": "In",
"values": [
"us-fremont-1a"
]
}
]
}
]
}
}
}
But when I converted to base64, I got this
ewogICAgIm5vZGVBZmZpbml0eSI6IHsKICAgICAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgICAgICAgICAgewogICAgICAgICAgICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgICAgICAgICAgICAgICAia2V5IjogInRvcG9sb2d5Lmt1YmVybmV0ZXMuaW8vem9uZSIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgInZhbHVlcyI6IFsKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAidXMtZnJlbW9udC0xYSIKICAgICAgICAgICAgICAgICAgICAgICAgICAgIF0KICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgIF0KICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgXQogICAgICAgIH0KICAgIH0KfQ==
Which is not what you had in your config file. Could this be the culprit?
@a-blender - that's a great call out, I actually had taken this:
{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["us-fremont-1a"]}]}]}}}
And pretty-printed that to via JSON beautify to:
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "topology.kubernetes.io/zone",
"operator": "In",
"values": [
"us-fremont-1a"
]
}
]
}
]
}
}
}
Just for easier reading - and wasn't actually using that for the vm_affinity
property on the terraform resource.
But in the:
vm_affinity = "eyJub2RlQWZmaW5pdHkiOnsicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6eyJub2RlU2VsZWN0b3JUZXJtcyI6W3sibWF0Y2hFeHByZXNzaW9ucyI6W3sia2V5IjoidG9wb2xvZ3kua3ViZXJuZXRlcy5pby96b25lIiwib3BlcmF0b3IiOiJJbiIsInZhbHVlcyI6WyJ1cy1mcmVtb250LTFhIl19XX1dfX19"
I was using the Base64 Encoded version of:
{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.kubernetes.io/zone","operator":"In","values":["us-fremont-1a"]}]}]}}}
Not the pretty-printed JSON one.
This issue is being discussed offline with @irishgordo and @futuretea - so far it looks like TFP changes are actually good and there may be an issue in Harvester itself.
Test Plan
terraform {
required_version = ">= 0.13"
required_providers {
harvester = {
source = "harvester/harvester"
version = "0.6.1"
}
}
}
provider "harvester" {
kubeconfig = "
resource "harvester_image" "focal-server" { name = "focal-server" namespace = "harvester-public"
display_name = "focal-server-cloudimg-amd64.img" source_type = "download" url = "https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img" }
data "harvester_clusternetwork" "mgmt" { name = "mgmt" }
resource "harvester_network" "mgmt-vlan1" { name = "mgmt-vlan1" namespace = "harvester-public"
vlan_id = 1
route_mode = "auto" route_dhcp_server_ip = ""
cluster_network_name = data.harvester_clusternetwork.mgmt.name }
```bash
terraform init
terraform apply
Setup a Rancher v2.7.2/v2.7-head cluster
Import Harvester cluster to the Rancher cluster in Virtualization Management
use cluster name foo-harvester
install the v3.0.0-rc2 rancher2 provider
wget https://raw.githubusercontent.com/rancher/terraform-provider-rancher2/master/setup-provider.sh
chmod +x setup-provider.sh
./setup-provider.sh rancher2 v3.0.0-rc2
Use the following test config
terraform {
required_providers {
rancher2 = {
source = "terraform.local/local/rancher2"
version = "3.0.0-rc2"
}
}
}
provider "rancher2" { api_url = "<>" access_key = "<>" secret_key = "<>" insecure = true }
data "rancher2_cluster_v2" "foo-harvester" { name = "foo-harvester" }
resource "rancher2_cloud_credential" "foo-harvester" { name = "foo-harvester" harvester_credential_config { cluster_id = data.rancher2_cluster_v2.foo-harvester.cluster_v1_id cluster_type = "imported" kubeconfig_content = data.rancher2_cluster_v2.foo-harvester.kube_config } }
resource "rancher2_machine_config_v2" "foo-harvester-v2" { generate_name = "foo-harvester-v2" harvester_config { vm_namespace = "default" cpu_count = "2" memory_size = "4" vm_affinity = "ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJub2RlLXJvbGUua3ViZXJuZXRlcy5pby9jb250cm9sLXBsYW5lIiwKICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICJ2YWx1ZXMiOiBbCiAgICAgICAgICAgICAgICAidHJ1ZSIKICAgICAgICAgICAgICBdCiAgICAgICAgICAgIH0KICAgICAgICAgIF0KICAgICAgICB9CiAgICAgIF0KICAgIH0KICB9Cn0=" disk_info = <<EOF { "disks": [{ "imageName": "harvester-public/focal-server", "size": 40, "bootOrder": 1 }] } EOF network_info = <<EOF { "interfaces": [{ "networkName": "harvester-public/mgmt-vlan1" }] } EOF ssh_user = "ubuntu" user_data = "I2Nsb3VkLWNvbmZpZwpwYWNrYWdlX3VwZGF0ZTogdHJ1ZQpwYWNrYWdlczoKICAtIHFlbXUtZ3Vlc3QtYWdlbnQKICAtIGlwdGFibGVzCnJ1bmNtZDoKICAtIC0gc3lzdGVtY3RsCiAgICAtIGVuYWJsZQogICAgLSAnLS1ub3cnCiAgICAtIHFlbXUtZ3Vlc3QtYWdlbnQuc2VydmljZQo=" } }
resource "rancher2_cluster_v2" "foo-harvester-v2" { name = "foo-harvester-v2" kubernetes_version = "v1.24.11+rke2r1" rke_config { machine_pools { name = "pool1" cloud_credential_secret_name = rancher2_cloud_credential.foo-harvester.id control_plane_role = true etcd_role = true worker_role = true quantity = 1 machine_config { kind = rancher2_machine_config_v2.foo-harvester-v2.kind name = rancher2_machine_config_v2.foo-harvester-v2.name } } machine_selector_config { config = { cloud-provider-name = "" } } machine_global_config = <<EOF cni: "calico" disable-kube-proxy: false etcd-expose-metrics: false EOF upgrade_strategy { control_plane_concurrency = "10%" worker_concurrency = "10%" } etcd { snapshot_schedule_cron = "0 /5 " snapshot_retention = 5 } chart_values = "" } }
```bash
terraform init
terraform apply
When I apply for the first time, such an error will occur, but it is OK to apply again. Is there anything wrong in my configuration file? Or is it a known problem?
rancher2_cloud_credential.foo-harvester: Creation complete after 2s [id=cattle-global-data:cc-rqgkh]
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for rancher2_cluster_v2.foo-harvester-v2 to include new values learned so far during apply,
│ provider "registry.terraform.io/rancher/rancher2" produced an invalid new value for
│ .rke_config[0].machine_pools[0].cloud_credential_secret_name: was cty.StringVal(""), but now
│ cty.StringVal("cattle-global-data:cc-rqgkh").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
terraform destroy
the source string of base64 encoded string
ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJub2RlLXJvbGUua3ViZXJuZXRlcy5pby9jb250cm9sLXBsYW5lIiwKICAgICAgICAgICAgICAib3BlcmF0b3IiOiAiSW4iLAogICAgICAgICAgICAgICJ2YWx1ZXMiOiBbCiAgICAgICAgICAgICAgICAidHJ1ZSIKICAgICAgICAgICAgICBdCiAgICAgICAgICAgIH0KICAgICAgICAgIF0KICAgICAgICB9CiAgICAgIF0KICAgIH0KICB9Cn0=
is
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "node-role.kubernetes.io/control-plane",
"operator": "In",
"values": [
"true"
]
}
]
}
]
}
}
}
I found that configuring vm_affinity with a json string would cause the ui to stuck when Edit Config
@lanfon72
Tested vm_affinity
could be applied correctly with terraform-provider-rancher2 v3.0.0-rc2 and Harvester version v1.1-cff1d5b5-head which includes the fix:https://github.com/harvester/harvester/issues/3816
@futuretea thank you for providing the testing steps and highlighting the tf resources needing to be created :smile: :+1:
Re-tested with Harvester v1.1.1, Rancher v2.7.2, Rancher2 Terraform v3.0.0-rc2 and was successful.
Validated that I was able to provision an RKE2 cluster that had the rancher2_machine_config_v2
's vm_affinity
as either:
ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJ0b3BvbG9neS5rdWJlcm5ldGVzLmlvL3pvbmUiLAogICAgICAgICAgICAgICJvcGVyYXRvciI6ICJJbiIsCiAgICAgICAgICAgICAgInZhbHVlcyI6IFsKICAgICAgICAgICAgICAgICJ1cy1mcmVtb250LTFhIgogICAgICAgICAgICAgIF0KICAgICAgICAgICAgfSwKICAgICAgICAgICAgewogICAgICAgICAgICAgICJrZXkiOiAibmV0d29yay5oYXJ2ZXN0ZXJoY2kuaW8vbWdtdCIsCiAgICAgICAgICAgICAgIm9wZXJhdG9yIjogIkluIiwKICAgICAgICAgICAgICAidmFsdWVzIjogWwogICAgICAgICAgICAgICAgInRydWUiCiAgICAgICAgICAgICAgXQogICAgICAgICAgICB9CiAgICAgICAgICBdCiAgICAgICAgfQogICAgICBdCiAgICB9CiAgfQp9
###
locals {
vm_affinity_to_use = <<EOF
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "topology.kubernetes.io/zone",
"operator": "In",
"values": [
"us-fremont-1a"
]
},
{
"key": "network.harvesterhci.io/mgmt",
"operator": "In",
"values": [
"true"
]
}
]
}
]
}
}
}
EOF
##
resource "rancher2_machine_config_v2" "foo-harvester-v2-cloud-provider" {
generate_name = "foo-harvester-v2-cloud-provider"
harvester_config {
vm_namespace = "default"
cpu_count = "4"
memory_size = "8"
disk_info = <<EOF
{
"disks": [{
"imageName": "default/image-6v2ck",
"size": 40,
"bootOrder": 1
}]
}
EOF
network_info = <<EOF
{
"interfaces": [{
"networkName": "default/mgmt-1"
}]
}
EOF
ssh_user = "opensuse"
vm_affinity = local.vm_affinity_to_use
}
}
And with Harvester v1.1.1, Rancher v2.7.2, & Rancher2 TFProvider v3.0.0-rc2 it looked good :+1: :
@futuretea @irishgordo Thank you for verifying this on the harvester end! vmAffinity in TF appears to be working correctly as it was an issue with formatting values.
If this is done, can you please close it out?
Tested with Harvester v1.1.2, Rancher v2.7.2, Rancher2 TFProvider v3.0.0-rc2.
Was able to provision an RKE2 cluster w/ both a:
But as mentioned in https://github.com/harvester/harvester/issues/3820 , it seems that if the user had:
locals {
vm_affinity_to_use = <<EOF
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "topology.kubernetes.io/zone",
"operator": "In",
"values": [
"us-fremont-1a"
]
},
{
"key": "network.harvesterhci.io/mgmt",
"operator": "In",
"values": [
"true"
]
}
]
}
]
}
}
}
EOF
}
# Create a new rancher2 machine config v2 using harvester node_driver
resource "rancher2_machine_config_v2" "foo-harvester-v2-cloud-provider" {
generate_name = "foo-harvester-v2-cloud-provider"
harvester_config {
vm_namespace = "default"
cpu_count = "4"
memory_size = "8"
disk_info = <<EOF
{
"disks": [{
"imageName": "default/image-666vl",
"size": 40,
"bootOrder": 1
}]
}
EOF
network_info = <<EOF
{
"interfaces": [{
"networkName": "default/mgmt-1"
}]
}
EOF
ssh_user = "opensuse"
#vm_affinity = "ewogICJub2RlQWZmaW5pdHkiOiB7CiAgICAicmVxdWlyZWREdXJpbmdTY2hlZHVsaW5nSWdub3JlZER1cmluZ0V4ZWN1dGlvbiI6IHsKICAgICAgIm5vZGVTZWxlY3RvclRlcm1zIjogWwogICAgICAgIHsKICAgICAgICAgICJtYXRjaEV4cHJlc3Npb25zIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImtleSI6ICJ0b3BvbG9neS5rdWJlcm5ldGVzLmlvL3pvbmUiLAogICAgICAgICAgICAgICJvcGVyYXRvciI6ICJJbiIsCiAgICAgICAgICAgICAgInZhbHVlcyI6IFsKICAgICAgICAgICAgICAgICJ1cy1mcmVtb250LTFhIgogICAgICAgICAgICAgIF0KICAgICAgICAgICAgfSwKICAgICAgICAgICAgewogICAgICAgICAgICAgICJrZXkiOiAibmV0d29yay5oYXJ2ZXN0ZXJoY2kuaW8vbWdtdCIsCiAgICAgICAgICAgICAgIm9wZXJhdG9yIjogIkluIiwKICAgICAgICAgICAgICAidmFsdWVzIjogWwogICAgICAgICAgICAgICAgInRydWUiCiAgICAgICAgICAgICAgXQogICAgICAgICAgICB9CiAgICAgICAgICBdCiAgICAgICAgfQogICAgICBdCiAgICB9CiAgfQp9"
vm_affinity = local.vm_affinity_to_use
}
}
With the JSON based local.var
TF for vm_affinity
(instead of base64 encoded), it will not allow the user to "edit" the config of the RKE2 Cluster in Cluster Management on Rancher v2.7.2.
( see screenshot that shows "Loading" on v2.7.2 )
@futuretea , is there possibly a workaround for- https://github.com/harvester/harvester/issues/3820 - if so, perhaps we could close this out?
@irishgordo workaround: https://github.com/rancher/terraform-provider-rancher2/pull/1110
Awesome, thanks for that @futuretea :smile: :+1: - since it's noted in the docs about base64 only - I'll go ahead and close this out :smile:
Maybe I'm missing the config somewhere, but I've looked all over and can't find a way to configure node scheduling rules for Harvester deployments. Here's a screenshot from the UI of what I'm trying to configure:
I was expecting to find a spot to configure this in
rancher2_machine_config_v2
, since it seems this setting gets applied to the propertyvmAffinity
for the resultingHarvesterConfig
. E.g. value:which decodes to
This is important so I can ensure nodes deploy across availability zones. Right now, they're all going into one. Thanks for any help!
SURE-5854