Open gaktive opened 11 months ago
There's two issues associated with this bug, only one of which I'm able to reproduce at this point:
Invalid Templates are selected when editing a VMware vSphere RKE2 Cluster (unable to reproduce, might be associated with Terraform or it's not an issue anymore)
Clicking "Edit as YAML" will commit changes without the need to Save
Converting this to an Epic because there are multiple issues described by this ticket
I have encountered this issue as well. When modifying the cron job for etcd. I used the UI to edit the config and it automatically selected a bad template for the control plane from the top. Since the previous template was not in the correct format in the datastore Rancher tried to rebuild the VM. This in turn caused a provisioning storm in vSphere as the template it had selected did not work. Since rancher was unable to install on the selected template it caused loss of cluster stability. This led to a 3 day outage of the downstream cluster.
Based on discussion with @rak-phillip about the original issue where if a user clicks 'Edit Config' in Rancher UI, the node Template is not populated with the template defined in their TF config.
Unexpected behavior resulting from a user modifying a cluster via the UI that was originally created with Terraform is not necessarily supported. If a user creates a cluster via TF, it is expected they will update the infrastructure via running TF apply. It is also expected that the Edit Config
page in the UI show the correct template data that the user defined via Terraform so that is a bug.
IMO this appears to be a UI issue where the UI is pulling the first vSphere node template from the backend server without considering TF template input. Either that, or TF is somehow not setting the template resource fields on the cluster management object correctly. This needs to be tested and verified.
@rak-phillip Were you unable to reproduce on 2.7.5?
I'm using 2.7.6 right now with TF provider 3.2.0. The template is showing up in the UI right now. I can reproduce this issue in 2 ways.
I can also reproduce in my home lab by deploying a new cluster > renaming the vSphere template > edit cluster config to view say etcd data.
I would argue an incorrect template name in TF config is not a supported use case. But, if a correct vSphere template is being modified / rebuilt by packer in vCenter then you see the wrong template in the Edit Config page? Screenshots would be very helpful here :)
@a-blender yes, we were able to repro and identify the issue reliably. We've made some minor enhancements to the form to prevent users from entering incorrect values and to warn them about the potential impact of changes, specifically:
We intend to follow up in a later release with more enhancements, but the changes in place will at least help increase awareness when there are potential errors in supplied data.
In vCenter I have the following templates. Using the highlighted as the example. Rancher shows the correct template being selected. Now I modify the name in vCenter. Then when I go back to rancher the UI shows the following. Which is the first template from those available in vCenter. I scrolled over this when working on setting a cron schedule thru the UI for etcd snapshots. Once I clicked save it triggered my control plane to rebuild.
The above happened outside of using edit as yaml. This example cluster was provisioned outside of TF and strictly thru the rancher UI.
@rak-phillip Great, thank you!
@adventurousyeti Did you also update the name of the VM template in your TF config to Rancher_BM_PoC
before trying to modify the cluster again?
@adventurousyeti Got it, that is purely a UI issue and will be fixed by the UI team.
@rak-phillip can help here as we transfer this ticket and related ones over.
Possible backend ticket that's related (or blocks us): https://github.com/rancher/rancher/issues/41307
Internal reference: SURE-6778 Reported in 2.7.5
A user deploys RKE2 clusters (1.24.x) in Rancher 2.7.5 as Vsphere Clusters using Terraform. Creation works fine, the cluster appears in Rancher and gets created normally.
They noticed that once the cluster is created, if they click on 'Edit Config' in Rancher UI, the field Template is not populated with the value defined of their TF file, but with another vm template of the vCenter.
It seems like Rancher takes the first template of the drop-down list. Most of the time, it is not a problem because, all action they make on any existing cluster is done using Terraform, so the good template is always used. But if, for any reason, they decide to modify any parameter of the cluster using the Rancher UI without correcting the template value on each pool, it will save the wrong template and proceed to recreate all the vm of the cluster.
Even worse, they don't need to click on save button to save the wrong template! In the 'Edit Config' interface, under 'Edit as YAML' it also save the change and start recreating the VM.
Business impact: High. The nodes/machine will be deleted and recreated.
vSphere config & Terraform setup available.
Proposed solution