Open dasarinaidu opened 1 year ago
This is a UI bug.
The root cause of the problem is the fact that we are delivering an invalid configuration to RKE2, namely specifying profile: null
in machineGlobalConfig
.
This is very easily reproducible -- create a new v2prov cluster in the UI, click "Edit as YAML", and note that profile: null
under machineGlobalConfig
.
See attached screenshot.
@nwmac @gaktive , this is likely a side-effect of recent changes for profile behavior to accommodate k8s 1.25 changes so possibly a regression we don't want to happen in the upcoming 2.7.2 release. Marking as regression
and release-blocker
.
@snasovich Backend should ignore the null profile. If the configuration is invalid, backend should reject the request.
/fordwardport v2.7.next2
@snasovich Backend should ignore the null profile. If the configuration is invalid, backend should reject the request.
@nwmac , I agree with this in principle, but it's still a change of the behavior on UI's part that we want to be reverted. I see it's already in test, thank you for a quick turn-around.
https://github.com/rancher/rancher/issues/40942 - backend issue
Setting this ticket to Reopened until a Test Template can be provided.
cc @nwmac
It was determined by management that this ticket does not need a Test Template to complete testing. Therefore I am setting this ticket back to QA Working.
During testing upgrading from v2.7.0
to v2.7-head
(Commit ID: 41b2cf3), I did not observe the issue with the spammed logs when following the Steps to Reproduce. Therefore I am closing this ticket as Done.
Test Steps:
v1.23.14+rke2r1
(depreciated)v2.7-head
v1.25.7+rke2r1
and SAVEScreenshots:
Rancher Logs: Rancher Logs
Reopen for some more validations.
With Docker on a single-node instance of Rancher v2.7.0 => v2.7.2-rc6
:
❗ Unable to reproduce ❗
Scenario | Test Case | Result |
---|---|---|
1. | Provision downstream rke2 w/ k8s 1.23 on 2.7.0 | ✅ |
2. | Upgrade Rancher from 2.7.0 => 2.7.2-rc6 , then upgrade cluster's k8s version to 1.25 ; observe profile field of machineGlobalConfig is left out of request |
✅ |
3. | Provision downstream rke2 w/ k8s 1.25 on v2.7.2-rc6 |
✅ |
Scenario 1:
v2.7.0
Scenario 2:
v2.7.2-rc6
v2.7.2-rc6
, upgrade rke2 cluster k8s version to 1.25
, opening the network tab and observe profile
field of machineGlobalConfig
is left out of request (since it is set to null
)Scenario 3:
v2.7.2-rc6
Re-opening due to the following concerns:
profile: null
is still present when navigating to Cluster Management > View YAML
The fix that went in, which did resolve the flooded logs symptom the issue reported, seems to have removed profile: null
from the request body sent from the UI. The field still persists and is seen in View YAML
from rancher UI. This is unexpected. Re-opening to request this field gets hidden behind #
To clarify here, I would expect that I can copy-paste the YAML that comes out of the UI's Edit as YAML (or View as YAML) interface and directly kubectl apply|create
it in my cluster and end up with an identical result.
If the UI is selectively ignoring fields when marshalling the object and sending it over the wire, that is not ideal, although it does I suppose provide a resolution to the originally reported bug.
The behavior I would expect to see is
Scenario A: If the user does not define a profile for their cluster and edits the cluster as YAML, there should be no profile
key at all in the machineGlobalConfig
, or if there is, it should be commented out like we do the rest of the fields that do not have explicitly set values, for example see bucket
in the screenshot here: https://github.com/rancher/dashboard/issues/8480#issuecomment-1477180460
Scenario B: If the user defines a profile for their cluster, I would expect to see the profile: cis-1.5
entry under machineGlobalConfig
There is no case in which null
is a valid value for profile
, but profile
is a valid configuration parameter for RKE2. We do not have a mechanism to perform value validation at this point for RKE2/K3s (that is a feature enhancement on the K3s/RKE2 side), and thus on the backend side would have to hard code some type of validation to ensure that null
is not being delivered.
This isn't a blocker for 2.7.2, which is now in the 11th hour of its release, and the related backend ticket is also pushed out. Moving to Q2.
The remaining work is to comment out the profile
key when editing as yaml - this is currently driven off of data that comes from the backend via the schema.
Bumping out of 2.7.next2 - tis requires for investigation and will require test automation
@Oats87 we should coordinate with you and/or your team about how UI & backend should handle this.
cc @Sahota1225
This may be something the UI is doing when trying to populate fields we expect, so we should scrub upon save.
Rancher Server Setup
Information about the Cluster
v1.23.14+rke2r1 to v1.25.7+rke2r1
User Information
Describe the bug Sometimes when performing an HA Upgrade to Rancher v2.7.0 TO v2.7.2-rc6, while having an AWS RKE2 custom cluster - the cluster will go into Updating state and the etcd nodes will go into a Reconciling state and the INFO logs are spammed and keep coming the same message.
To Reproduce
Result a. Downstream cluster initially came back active and then it went to update state with an error and it never came back to Active b. Rancher logs are spammed with INFO, these logs are keep coming for every second c. One of the node (etcd) is in Reconciling state and never came back with message on UI (Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet)
Expected Result a. Rancher logs should not be spammed with the logs every second.
LOGS
2023/03/17 18:36:58 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:36:58 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:03 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:03 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:08 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:08 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:13 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:13 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:18 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:18 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:23 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:23 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:28 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:28 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:33 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:33 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:38 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:38 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:43 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:43 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:48 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:48 [INFO] [planner] rkecluster fleet-default/dascrke2: waiting: configuring bootstrap node(s) custom-d953bccd4557: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet 2023/03/17 18:37:51 [ERROR] error syncing 'fleet-default/custom-5e4c4afa8ddd': handler unmanaged-machine: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes ip-172-31-2-93), requeuing