vmware / cluster-api-provider-cloud-director

Cluster API Provider for VMware Cloud Director. The project is an open source implementation of K8s ClusterAPI project and allows customers to provision resources directly from VMware Cloud Director. It enables Cloud Director powered Clouds to be treated as yet-another-cloud in the multi-cloud journey for VMware Cloud Providers.
Apache License 2.0
37 stars 34 forks source link

Use cloud-init metadata to configure the VM network #506

Open dlipovetsky opened 1 year ago

dlipovetsky commented 1 year ago

Is your feature request related to a problem? Please describe.

When Guest Customization is used together with cloud-init to bootstrap a Linux machine, a customized network configuration either fails to take effect on the first boot, or is removed on subsequent boots.

CAPVCD customizes the network configuration, namely it configures a Static IP. VCD makes this information available to the VM using Guest Customization on the first boot.

When the machine first boots, cloud-init runs. To customize the network configuration, it must delegate to the VMware Guest Tools, because cloud-init network configuration is found in metadata, and CAPVCD does not define any cloud-init metadata. VMware Guest Tools correctly configures the network.

However, if the machine ever reboots, the Guest Customizaton data is not available (it is, by design, available only on first boot, or whenever the information changes). As a result, cloud-init believes it is running on a first boot, and tries to configure the network again, delegates to VMware Guest Tools, which now does not have the information it needs, so it fails to configure the network[1].

When VMware Guest Tools fails to configure the network, cloud-init uses its "fallback" network configuration, which is DHCP on both RHEL and Ubuntu. On RHEL, it overwrites the correct network configuration; the machine no longer has network connectivity. On Ubuntu, cloud- writes a network configuration that, by total coincidence, has a lower priority than the one written on first boot; the machine continues to have network connectivity.

[1] This is a long-standing issue, reported in 2019 for both RHEL (https://bugzilla.redhat.com/show_bug.cgi?id=1750862), and Ubuntu (https://bugs.launchpad.net/cloud-init/+bug/1835205).

Describe the solution you'd like

CAPVCD should define cloud-init metadata with the network configuration, namely:

  1. Create a VM, with Guest Customization disabled, in a powered-off state
  2. After VCD creates the VM, read the Static IP information from the VM properties
  3. Generate the cloud-init metadata from this information
  4. Encode the metadata using base64 and create two VM properties:
    • guestinfo.metadata:
    • guestinfo.metadata.encoding: base64
  5. Power on the VM

On first boot, cloud-init will use the metadata to configure the network. On subsequent boots, it will do the same.

This proposal is similar to what CAPVCD already does, namely, it creates the VM in a powered-off state, writes cloud-init userdata, and then powers on the VM. However, I need help to identify the VCD API to use in Step 2 to retrieve the VM properties.

Describe alternatives you've considered

I've tried the workarounds described in the RHEL and Ubuntu bug reports. They require preventing cloud-init from configuring the network after first boot. The workarounds are complex, and stateful, and frankly, hard to validate, because of the interaction between cloud-init and VMware Guest Tools.

Additional context

This issue is the result of a thread in CAPVCD slack.

PengpengSun commented 1 month ago

Hi @dlipovetsky ,

The long-standing issue you mentioned in [1] has been resolved in cloud-init 24.2 release, the commit is https://github.com/canonical/cloud-init/commit/9929a00580d50afc60bf4e0fb9f2f39d4f797b4b. After reboot, cloud-init won't use fallback DHCP network configuration, the network configuration set according to Guest Customization data will be preserved.