Closed deniseschannon closed 3 years ago
The wording is kind of confusing in the issue especially the following.
In use case 1:
Verify that the no nodes with no labels only get the global arguments This is confusing but I'll take it as "Verify that the nodes with no labels only get the global arguments"
I think I understand the gist of this though. "For all nodes" is global (obviously) -- meaning we'd expect any kube args we specify to affect ANY node regardless of labels set or not set. Anything set with Machine Seclector is setting a kubelet arg we specify specific only to any node with labels we declare.
Reproduction: Not required.
Validation Failed:
Rancher version: master-head b39523882
8/2/21
Rancher Cluster type: single-node docker
Docker version: 19.03
Downstream cluster type: custom rke2 cluster provisioned by rancher Downstream K8s version: v1.21.3-rc1+rke2r2
Validation steps:
Use Case 1 - Use of global args:
tacos=rule
Create a third pool for worker WITHOUT any node labels set
Under the Cluster Configuration section, pick "Advanced" and set Additional Kubelet Args:
-- For the "For all nodes:" field (global) enter a value of --max-pods=57
this kubelet arg should apply to all nodes regardless of any lables set or not set.
-- For the "For machines with labels matching:" fields, set to match for key "tacos" and operator "is set".
-- For the "Add additional Kubelet args:" field, enter a value of --maximum-dead-containers=11
. This kubelet arg should apply only to nodes with the "tacos" label set (1 of our 2 worker pools)
We can see that the yaml passed by the UI (provisioning.cattle.io.clusters) indicates the kubelet-arg in the machineSelectorConfig in rkeConfig. NOTE: The YAML below differs slightly from my validation steps above because in this case I was just passing one global kubelet arg to see if this works with a 2nd attempt with simpler cluster / node pool configuration.
{
"type": "provisioning.cattle.io.cluster",
"metadata": {
"namespace": "fleet-default",
"name": "dave-testrke2-test2-c2"
},
"spec": {
"rkeConfig": {
"chartValues": {
"rke2-calico": {
"calicoctl": {
"image": "rancher/mirrored-calico-ctl",
"tag": "v3.19.1"
},
"certs": {
"node": {
"cert": null,
"commonName": null,
"key": null
},
"typha": {
"caBundle": null,
"cert": null,
"commonName": null,
"key": null
}
},
"global": {
"systemDefaultRegistry": ""
},
"imagePullSecrets": {},
"installation": {
"calicoNetwork": {
"bgp": "Disabled",
"ipPools": [{
"cidr": "10.42.0.0/16",
"encapsulation": "VXLAN",
"natOutgoing": "Enabled"
}
]
},
"controlPlaneTolerations": [{
"effect": "NoSchedule",
"key": "node-role.kubernetes.io/control-plane",
"operator": "Exists"
}, {
"effect": "NoExecute",
"key": "node-role.kubernetes.io/etcd",
"operator": "Exists"
}
],
"enabled": true,
"imagePath": "rancher",
"imagePrefix": "mirrored-calico-",
"kubernetesProvider": ""
},
"tigeraOperator": {
"image": "rancher/mirrored-calico-operator",
"registry": "docker.io",
"version": "v1.17.4"
}
}
},
"upgradeStrategy": {
"controlPlaneConcurrency": "10%",
"controlPlaneDrainOptions": {},
"workerConcurrency": "10%",
"workerDrainOptions": {}
},
"machineGlobalConfig": {
"cni": "calico",
"profile": null
},
"machineSelectorConfig": [{
"config": {
"cloud-provider-name": "none",
"kubelet-arg": ["max-pods=77"]
}
}
],
"etcd": {
"disableSnapshots": false,
"s3": null,
"snapshotRetention": 5,
"snapshotScheduleCron": "0 */5 * * *"
},
"localClusterAuthEndpoint": {
"enabled": false,
"caCerts": "",
"fqdn": ""
},
"machinePools": [{
"name": "test2",
"etcdRole": true,
"controlPlaneRole": true,
"workerRole": true,
"hostnamePrefix": "dave-testrke2-test2-c2-test2-",
"labels": {},
"quantity": 1,
"machineConfigRef": {
"kind": "DigitaloceanConfig",
"name": "nc-dave-testrke2-test2-c2-test2-pmhww"
}
}
]
},
"kubernetesVersion": "v1.21.3-rc1+rke2r2",
"defaultPodSecurityPolicyTemplateName": "",
"cloudCredentialSecretName": "dave-rke2-do-credentials"
}
}
However, when the cluster provisions it does not show the kubelet args in the node annotation rke2.io/node-args
:
[
"server",
"--agent-token",
"********",
"--cni",
"calico",
"--etcd-snapshot-retention",
"5",
"--etcd-snapshot-schedule-cron",
"0 */5 * * *",
"--node-label",
"rke.cattle.io/machine=1d7bcda7-903a-4d3d-8cd0-bd9ee65c8c8b",
"--token",
"********"
]
I also ssh'ed into the node directly and used ps
to check the arguments passed to the kubelet process. The args are missing there as well. I have tested this a few times and see the same result each time. My guess is this is a backend issue -- the ui did specify the machineSelectorConfig as expected, but the kubelet args don't actually get specified during rke2 node provisioning.
Additional Info (logs, etc):
I tried in the UI to pass kubelet args both with and without the -- like:
max-pods=77
--max-pods=77
In either case, I don't see the kubelet args actually passed to the rke2 config and thus the node's kubelet process does not have these args.
I didn't spot anything interesting in rancher server logs. I can provide them upon request.
Use Case 2 - Not using any global args: Not attempted - use case 1 failed and so use case 2 should also fail. Can check use case 2 when issue is resolved and validate again.
@deniseschannon @Jono-SUSE-Rancher transferred this issue to rancher/rancher per Vincent. Probably a backend issue here See my last comment with details. Basically ui appears to correctly specify machineSelectorConfig (kubelet-args) but these kubelet args don't actually get passed into rke2 config during rke2 node provisioning.
This is an issue in the planner in rancher. No selector was treated as match nothing, not match all.
https://github.com/rancher/rancher/pull/33913 for dev-v2.6 [backport] as well
Reproduction: Not required.
Validation Failed:
Rancher version: master-head 2b156fd2b
8/4/21
Rancher Cluster type: single-node docker
Docker version: 19.03
Downstream cluster type: custom rke2 cluster provisioned by rancher Downstream K8s version: v1.21.3-rc2+rke2r2
Validation steps:
tacos=rule
Create a third pool for worker WITHOUT any node labels set
Under the Cluster Configuration section, pick "Advanced" and set Additional Kubelet Args:
-- For the "For all nodes:" field (global) enter a value of --max-pods=57
this kubelet arg should apply to all nodes regardless of any lables set or not set.
-- For the "For machines with labels matching:" fields, set to match for key "tacos" and operator "is set".
-- For the "Add additional Kubelet args:" field, enter a value of --maximum-dead-containers=11
. This kubelet arg should apply only to nodes with the "tacos" label set (1 of our 2 worker pools) -- THIS IS WHAT DOES NOT WORK, DETAILS BELOW
Note, the POST request body for provisioning.cattle.io.cluster
appears correct, relevant snippet below:
"machineSelectorConfig": [{
"config": {
"kubelet-arg": ["--max-pods=57"]
}
}, {
"machineLabelSelector": {
"matchLabels": {},
"matchExpressions": [{
"key": "tacos",
"operator": "Exists"
}
]
},
"config": {
"kubelet-arg": ["--maximum-dead-containers=11"]
}
}
],
Once the nodes are provisioned, we can see that the rke2.io/node-args
annotation appears correct as far as the global kubelet arg. The kubelet args were passed into the rke2 config. Below is an example from the etcd+cp node, we can see that my global arg "--max-pods=57" is set as expected. It's also set on the other nodes, as expected.
[
"server",
"--agent-token",
"********",
"--cni",
"calico",
"--etcd-snapshot-retention",
"5",
"--etcd-snapshot-schedule-cron",
"0 */5 * * *",
"--kubelet-arg",
"--max-pods=57",
"--node-label",
"rke.cattle.io/machine=87fe5507-498c-4346-895a-109c58376849",
"--node-taint",
"",
"--token",
"********"
]
And here is rke2.io/node-args
on the labeled node - we can see that while the global kubelet arg is applied, as expected, the --maximum-dead-containers=11
I specified is NOT set, which is NOT expected. The kubelet arg should have been passed to this labeled node as per the machineLabelSelector
specification.
[
"agent",
"--kubelet-arg",
"--max-pods=57",
"--node-label",
"rke.cattle.io/machine=6f38f79b-1747-4240-8502-82e789fd0640",
"--node-label",
"tacos=rule",
"--server",
"https://143.244.177.155:9345",
"--token",
"********"
]
Here is some more info:
We can see that my node indeed does have the tacos=rule
label in the UI:
When I comb thru the rancher server logs I don't spot anything interesting. Let me know if you would like to see them.
Rancher version: v2.6-head 878475007
8/5/21
This should include the 2.6 pr merge, image published 11 hours ago at 10:33pm last night, commit from pr merged at 10:00pm last night.
@ibuildthecloud this still isn't quite right. When I specify a global arg plus an arg to match a node label -- the node with label match doesn't include the global arg. Expected to see global arg on all nodes regardless of any label match or no match.
node with label match has my arg specific to the node, but not the global arg (missing max-pods [global] arg):
[
"agent",
"--kubelet-arg",
"maximum-dead-containers=11",
"--node-label",
"rke.cattle.io/machine=78e8dda7-c04a-4a56-aa56-fbc2831bbdea",
"--node-label",
"tacos=rule",
"--protect-kernel-defaults",
"false",
"--server",
"https://REDACTED:9345",
"--token",
"********"
]
Other nodes have the max-pods arg, and not the maximum-dead-containers arg as expected. For the above, I expected to see both max-pods (global) arg and maximum-dead-containers arg.
Here is what the UI passed for cluster creation: Maybe this is the UI's fault? 2nd kubelet-arg should also include the max-pods arg since it's global?
{
"machineSelectorConfig": [
{
"config": {
"kubelet-arg": [
"max-pods=57"
],
"protect-kernel-defaults": false
}
},
{
"config": {
"kubelet-arg": [
"maximum-dead-containers=11"
]
},
"machineLabelSelector": {
"matchExpressions": [
{
"key": "tacos",
"operator": "Exists"
}
],
"matchLabels": {}
}
}
]
}
Offline convo with @ibuildthecloud and @vincent99.
Backend behavior is correct but frontend needs to indicate that it will override.
Since the backend behavior was correct, the UI was updated to correctly reflect expectations on trying to use kubelet args. @davidnuzik Please review the messaging and see if it aligns with your expectations of what you saw actualy happen.
Note to self: Review https://github.com/rancher/dashboard/pull/3708/commits/7073395214298ddaa08fd025ad8efb0079985512 and test
My checks PASSED
Reproduction Steps:
Not required.
Validation Steps:
Rancher version: v2.6-head 4b6af95d7 8/9/21 8:46 am Pacific Rancher cluster type: single-node docker Docker version: 19.03
Downstream cluster type: rke2 Downstream K8s version: v1.21.3-rc4+rke2r2
Validation steps:
I tested again. This time the UI mentions that the last selector "wins" to make it more clear that the above (first) kubelet args in the "For any machines, use the Kubelet args:" won't be set:
Once the cluster is created as per the above screenshot (and at least one node pool has the label tacos:rule), check the nodes. The pool that didn't have the label set, ensure that only max-pods arg is set on it, for the pool that has the label, ensure only the maximum-dead-containers arg is set.
Pool 1 looks correct (just the global arg)
Pool 2 looks correct (has my "tacos" label and therefore applies just the maximum-dead-containers arg)
I also SSH'd into the nodes to ensure these kubelet args were indeed set on each machine (only max-pods on node pool 1 and only maximum-dead-containers on pool2, each with correct values).
Additional Info: During testing, ensured that yaml set in the UI is correct when submitted form to provision the cluster.
Under the advanced section of the cluster configuration, you can set different kubelet args for different nodes. https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
We need to validate that:
If you create a machine/node with a specific set of node labels, those machines/nodes can be picked up using the machineSelector as well as that the kubelet args will append on top of the global kubelet args.
The UI is available today:
It may be easiest to test this with the following kubelet args and just verify that the args were passed to the nodes. --max-pods --maximum-dead-containers
Use Case rancher/dashboard#1 : Machine Setup Set of nodes with no labels Set of nodes with specific labels
Cluster Config Add global arguments for kubelet Add additional kubelet args for specific set of nodes (to match the label on the nodes)
Verify that the no nodes with no labels only get the global arguments Verify that the labeled nodes get the args from the global arguments AND the additional arguments
Use Case rancher/dashboard#2 : Machine setup Set of nodes with no labels Set of nodes with specific labels
Cluster Config: Add additional kubelet args for specific set of nodes (to match the label on the nodes)
Verify that the no nodes with no labels only get no additional arguments Verify that the labeled nodes get the args from the additional arguments