rancher / dashboard

The Rancher UI
https://rancher.com
Apache License 2.0
457 stars 257 forks source link

[BUG] vsphere cluster creation fails when using vapp network protocol profiles #12271

Open philipp1992 opened 2 years ago

philipp1992 commented 2 years ago

Using the latest rancher version, when creating a new vsphere cluster, it fails upon cloning the vm template to a virtual machine with the following error:

image

settings: image

Network protocol profile is configured as following:

image

have also tried with empty domain / search but same issue

philipp1992 commented 2 years ago

Error creating machine: Error in driver during machine creation: Invalid network in property guestinfo.dns.domains.

log from the fleet job

willsond01 commented 2 years ago

I too am having this issue, when I specify the same property in a vApp it is properly populated for example: guestinfo.dns.domains ${searchPath:[my-net-name]} When I power on the VM it is populated correctly. I also tried a few different variations like specifying a custom vApp and now unfortunately I also cannot delete the cluster since it is stuck waiting for a viable init node and there is nothing defined on the VM when I look at the properties.

tfoks commented 1 year ago

I just updated to Rancher 2.6.9 and this behavior is still there. I.e. using the option "Use vApp to configure networks with network protocol profiles" leads to the above error ("Invalid network in property guestinfo.dns.domains"). More than that using "Provide a custom vApp config" doesn't work either. All settings are done on the created machine but the just created VM seems to be booted such that vApp configurations are ignored. This seems to be for RKE2 types only. Doing the same for a RKE cluster works without problems.

willsond01 commented 1 year ago

I'll have to upgrade to 2.6.9 and see if that breaks it for me. I can't quite remember how I got past this but on 2.6.8 I am able to provision RKE2 clusters now. I know one thing that I found was when you switch to custom vApp the first part where is suggests "com.vmware.guestInfo" the font and other places where you use guestinfo make it hard to tell that the "I" in that one entry in "Info" is capital where the rest are lower (or perhaps case insensitive). Double check that and if I am remembering right you should be able to work around the first option.

tfoks commented 1 year ago

I digged a little deeper and I think I found the problem. The issue don't seem to be a bug but the different way how nodes are provisioned. The main difference is that during provision of RKE clusters the cloud-init user-data handed over to the VM only contains "groups" and "users". For RKE2 clusters on the other hand the file also contains "write-files" and "runcmd". These two can only appear once during the cloud-init process and thus breaks the VM template being used which also uses these two. As a result the network will not get configured using the vApp options.

willsond01 commented 1 year ago

I seem to recall something like that from a pre-2.5.x version I had running which I think might be why you have to specify the custom vApp config. Not having done anything too fancy with cloud-init, I wonder if like the Windows version Cloudbase-init if regular cloud-init has a directory where if you put scripts in it they get run? If so you could put the network config shell script I assume you are using there so you don't have to do additional write-files/runcmd.

djpbessems commented 1 year ago

This issue is actually nothing cloud-init related.

I ran into the same error and noticed that the vApp properties are populated with the fully qualified object path of the network portgroup (ie ${dns:/datacenter/folder/portgroupname}) instead of just the portgroup name (${dns:portgroupname}); by changing the vApp options radiobutton to Provide a custom vApp config, will allow you to remove the superfluous parts of the fully qualified object paths.