Closed boris-stojnev closed 1 year ago
@boris-stojnev I see the associated issue has been closed https://github.com/rancher/rancher/issues/38112 with the workaround of setting kubelet_arg
under machine_global_config
in your Terraform config which then causes the args to successfully get pushed down to the nodes. For v2 prov clusters, machine_global_config is a string that must be provided in YAML format - here's an example.
machine_global_config = <<EOF
config:
cloud-provider-name: "external"
kubelet-arg: "key1=value1,key2=value2"
EOF
Does this work for you?
@boris-stojnev Any update on this issue?
@a-blender sorry for the late reply.
My workaround was placing kubelet-arg
in config.yaml leveraging Ansible with other additional configuration I have.
As stated in the issue https://github.com/rancher/rancher/issues/38112, in that case changes will not be present in the Rancher UI. There is a separate section with Kubelet args in Rancher UI cluster configuration.
What you suggest will definitely not work, maybe this one will, but I didn’t test it and don’t know if there are side effects.
machine_global_config = <<EOF
cloud-provider-name: "external"
kubelet-arg:
- key1=value1
- key2=value2
EOF
As I said it should be fixed as I describe so that TF provider follows Rancher UI configuration options. It’s misleading when you check UI and see that kubelet-args are not defined, but actually they are.
After proper fix, we should be able to use kubelet-arg like in the below example, in the machine_selector_config, and it will be shown in Rancher UI. Then, this would be the proper config:
machine_selector_config {
config = {
cloud-provider-name = "external"
kubelet-arg = {
key1 = value1
key2 = value2
}
}
}
@boris-stojnev Thank you for the additional details. After investigating more, [7/26 Correction] machine global config is for rendering rke2 arguments into /etc/rancher/rke2/config.yaml.d/50-rancher.yaml
on all nodes, machine selector config is for rendering arguments into the same file but only for machines with a machine label selector. If an arg is set via machine selector config without specifying a label, that arg will be applied to all nodes. Most customers at this point, use machine global config for that use case though.
It makes sense kubelet arg under machine_global_config
is not exposed in the UI, but you are right it should be.
In tf, machine_selector_config
is a complex type and subfield config
is a Type.Map
https://github.com/rancher/terraform-provider-rancher2/blob/5a952a87c3cf479694a8ca4cefa0d510d8d2b4d2/rancher2/schema_cluster_v2_rke_config_system_config.go#L65-L69 which makes kubelet-arg
a string, so I don't see how the example I provided will not work. It fits the schema. Did you try with machine_selector_config
and check if the args got passed down to the nodes and was present in the ui? Let me try it on my end.
@a-blender Regarding your example, first of all, it’s in yaml format which is not supported by machine_selector_config
, and second, values of kubelet-arg will be interpreted if I remember correctly like in the issue we linked earlier.
Additional note regarding what you showed in the picture, it can’t be accomplished via TF. Cloud-provider-name=external from my example will not be shown on the UI, because it’s not under kubelet-arg so it will be treated as level one config. More info can be found here https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration#machineselectorconfig
I’m using machine_selector_config
as suggested here https://ranchermanager.docs.rancher.com/v2.6/reference-guides/rancher-security/rancher-v2.6-hardening-guides/rke2-hardening-guide-with-cis-v1.6-benchmark for setting other configs without labels, and that works.
On the other hand, Rancher UI is only showing the machine selector for kubelet args, and that is what's making confusion here. And additional Kubelet Args for any machine can't be set via TF to be shown on the UI.
To sum up, you can’t have kubelet arg in machine_selector_config
.
@boris-stojnev Sorry, I was asking about machine_global_config
earlier which supports yaml. Made a spelling error in the comment. Since investigating, I don't think using machine_global_config
is an ideal workaround because it deviates from the options exposed in the rancher UI.
I tried with the following config
machine_selector_config {
config = {
kubelet-arg = "cloud-provider=external"
}
}
with TF 1.25 and reproduced https://github.com/rancher/rancher/issues/38112
The kubelet-arg = "--protect-kernel-defaults" causes rancher to treat every char as an array element which makes it look like this in rancher. machineSelectorConfig: - config: kubelet-arg: - '-' - '-' - p - r - o - t - e - c - t - '-' - k - e - r - "n" - e - l - '-' - d - e - f - a - u - l - t - s
causing the UI to look like this
TF needs to have parity with Rancher so it needs to support passing multiple kubelet-arg
to the machine_selector_config
but from your comment https://github.com/rancher/terraform-provider-rancher2/issues/1074#issuecomment-1648412146 and what I see, this doesn't appear possible without a fix. This is a confirmed bug. I think updating the machine_selector_config.Config
to a Type.List
should solve this but will have to test it. We may need to add state migration logic as well or users with clusters already provisioned with an earlier version of TF may see them break.
@a-blender Now, while we speak, something crossed my mind. What I didn’t test, and maybe it's worth it for confirmation, defining labels. So the example should be something like below. Every argument as separate machine_selector_config
. Maybe in this way value of kubelet-arg wouldn't be treated as array of characters.
machine_selector_config {
config = {
kubelet-arg = "cloud-provider-name=external"
}
machine_label_selector {
match_expressions {
key = example-key
operator = In
values = [example-value1, example-value2]
}
match_labels {
key1 = value1
key2 = value2
}
}
}
machine_selector_config {
config = {
kubelet-arg = "config-key=config-value"
}
machine_label_selector {
match_expressions {
key = example-key
operator = In
values = [example-value1, example-value2]
}
match_labels {
key1 = value1
key2 = value2
}
}
}
But wait, as I see from the UI, you can have multiple args under same label selector. Ah, probably it wouldn't do the trick.
@boris-stojnev Yeah, setting one arg under kubelet-arg
inside a single block of machine_selector_config
doesn't translate properly so several of the same will not.
Machine Selector Config kubelet-arg
values could not be passed via Terraform to downstream machines and would not show up in the rancher UI.
My solution is to convert machine_selector_config.config
from Type.Map (string map) to Type.String that supports yaml like Machine Global Config + state migration logic to handle the schema update. This allows for users to input both strings and lists, which allows subfield kubelet-arg
to be passed as a list which is what Rancher is expecting. This feature now works with the following configuration
machine_selector_config {
config = <<EOF
kubelet-arg:
- key1=value1
- key2=value2
EOF
}
Example
machine_selector_config {
config = <<EOF
kubelet-arg:
- protect-kernel-defaults=true
- cloud-provider=external
EOF
}
See more details here.
Terraform RC: 3.2.0-rc4
Test plan
terraform plan/apply
or updates to Machine Selector Config.Run go test -v ./rancher2
to make sure all automated tests pass.
N/A - fix was designed to avoid Machine Selector Config regression. That being said, users can only configure one Machine Selector config
via TF whereas in the rancher backend you can configure multiple of the same field. No customers are asking for this, but just to note.
Update to Regressions Considerations If a user wants to configure multiple Machine Selector Configs
to assign kubelet args to specific cluster nodes based on node labels as is supported in Rancher, they can define that in a TF config using the same pattern in separate blocks
machine_selector_config {
config = <<EOF
kubelet-arg:
- key1=value1
- key2=value2
EOF
}
machine_selector_config {
config = <<EOF
kubelet-arg:
- key1=value1
- key2=value2
EOF
}
This will show up in Rancher as
Add machine selector labels to each config as needed.
Please wait until 3.2.0-rc3 to test, thank you.
todo: create TFP automation for kubelet-args
moving back to waiting for RC
based on this comment: https://github.com/rancher/terraform-provider-rancher2/issues/1074#issuecomment-1712155225 as only rc2 is available.
@slickwarren Jacob already cut rc3 https://github.com/rancher/terraform-provider-rancher2/releases/tag/v3.2.0-rc3 but assets are not finished generating. Check back shortly!
Verified on Rancher v2.8-0ff5fe88aa87c0383b7487b975ee8929df674185-head
:
Scenario | Test Case | Result |
---|---|---|
1. | Provision a downstream rke2 cluster with Machine Selector Config and 2 kubelet args set | ❌ |
2. | Update: Add/remove a kubelet-arg via tf | pending |
3. | Provision a downstream rke2 cluster with tf 3.1.0 => add machine selector config with 2 kubelet args via the rancher ui => Upgrade tf to v3.2.0-rc3 |
pending |
Scenario 1 -
v2.8-head
tfp-rancher2 v3.2.0-rc3
, provision a downstream RKE2 AWS Node driver cluster, using machine_selector_config
block and defining 2 kubelet arguments - [ i used the main.tf
shown below]
terraform {
required_providers {
rancher2 = {
source = "terraform.local/local/rancher2"
version = "3.2.0-rc3"
}
}
}
provider "rancher2" {
api_url = "
resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
name = "tf-creds-rke2"
amazonec2_credential_config {
access_key = "
resource "rancher2_machine_config_v2" "rancher2_machine_config_v2" {
generate_name = "tf-rke2"
amazonec2_config {
ami = ""
region = "
resource "rancher2_cluster_v2" "rancher2_cluster_v2" { name = "jkeslarrr3" kubernetes_version = "v1.27.6+rke2r1" enable_network_policy = false default_cluster_role_for_project_members = "user" rke_config { machine_pools { name = "pool1" cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id control_plane_role = false etcd_role = true worker_role = false quantity = 1 machine_config { kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind name = rancher2_machine_config_v2.rancher2_machine_config_v2.name } } machine_pools { name = "pool2" cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id control_plane_role = true etcd_role = false worker_role = false quantity = 1 machine_config { kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind name = rancher2_machine_config_v2.rancher2_machine_config_v2.name } } machine_pools { name = "pool3" cloud_credential_secret_name = rancher2_cloud_credential.rancher2_cloud_credential.id control_plane_role = false etcd_role = false worker_role = true quantity = 1 machine_config { kind = rancher2_machine_config_v2.rancher2_machine_config_v2.kind name = rancher2_machine_config_v2.rancher2_machine_config_v2.name } } machine_selector_config { config = <<EOF kubelet-arg:
3. Veri-FAILED: cluster hangs in `Updating` state with the following message: `Configuring bootstrap node(s): waiting for probes: etcd, kubelet` - cluster never resolves past this status and never comes up active. This message specifically comes from the etcd node, which is in a `reconcile` state.
Additional Context:
When removing the machine_selector_config
block, which defined 2 kubelet arguments, from the main.tf shown above, tfp-rancher2 v3.2.0-rc3
was successful in spinning up the downstream cluster.
@Josh-Diamond There's some confusion about how/which arguments to pass via TF to the kubelet for a working v2 cluster. Here's a working example. I updated the Test Template.
// Working example
machine_selector_config {
config = <<EOF
kubelet-arg:
- protect-kernel-defaults=true
- cloud-provider=external
EOF
}
I also missed a backport to release/v3. After I get that in and cut a new RC, please re-test this on v3.2.0-rc4.
Verified on Rancher v2.7.8-rc1
:
Scenario | Test Case | Result |
---|---|---|
1. | Provision a downstream rke2 cluster with Machine Selector Config and 2 kubelet args set | ✅ |
2. | Update: Add/remove a kubelet-arg via tf | ✅ |
3. | Provision a downstream rke2 cluster with tf 3.1.0 => Upgrade tf to v3.2.0-rc3 and add machine selector config with 2 kubelet args => update/modify kubelet args once more and verify they are successfully accepted + functional |
pending/blocked |
Scenario 1 - ✅
v2.7.8-rc1
tfp-rancher2 v3.2.0-rc4
, provision a downstream RKE2 AWS Node driver cluster, using machine_selector_config
block and defining 2 kubelet arguments - [ i used the main.tf
shown below]
terraform {
required_providers {
rancher2 = {
source = "terraform.local/local/rancher2"
version = "3.2.0-rc3"
}
}
}
provider "rancher2" {
api_url = "
resource "rancher2_cloud_credential" "rancher2_cloud_credential" {
name = "tf-creds-rke2"
amazonec2_credential_config {
access_key = "
resource "rancher2_machine_config_v2" "rancher2_machine_config_v2" {
generate_name = "tf-rke2"
amazonec2_config {
ami = ""
region = "
resource "rancher2_cluster_v2" "rancher2_cluster_v2" { name = "jkeslar" kubernetes_version = "v1.26.8+rke2r1" enable_network_policy = false default_cluster_role_for_project_members = "user" rke_config { machine_selector_config { config = <<EOF kubelet-arg:
3. Verified - cluster successfully provisions and kubelet args are successfully set; kubelet args are seen via Rancher UI; as expected
Scenario 2 - ✅
max-pod
limit to 255, using tfp-rancher2 v3.2.0-rc4
and re-run terraform apply
max-pod
kubelet arg successfully updated; as expectedv3.2.0-rc4
, remove and delete kubelet argsScenario 3 - ✅
v2.7.8
v3.1.0
, provision a downstream RKE2 AWS Node driver clusteractive
, update tfp-rancher2 to v3.2.0-rc5
v3.2.0-rc5
, define a machine_selector_config
block and set multiple kubelet-args under config
no longer blocked by https://github.com/rancher/terraform-provider-rancher2/issues/1243
https://github.com/rancher/terraform-provider-rancher2/issues/1243 has been identified to be Rancher UI specific, and is not caused or related to tfp-rancher2. Although this issue was encountered in my testing, this is purely a UI symptom, unrelated to tfp-rancher2.
Resuming testing now...
above test results have been updated + completed. Closing out this issue now
Rancher Server Setup
Information about the Cluster
User Information
Provider Information
Describe the bug
Using kubelet-arg via Terraform rancher2 provider resulting in unexpected behavior of cluster not starting. This bug is described here https://github.com/rancher/rancher/issues/38112 I think this is specific to the Terraform provider, not the Rancher itself.
Kubelet-arg which is part of
config
undermachine_selector_config
should be list type not string. In other words,config
should be able to accept list types.