nutanix / terraform-provider-nutanix

Terraform Nutanix Provider
https://www.terraform.io/docs/providers/nutanix/
Mozilla Public License 2.0
100 stars 112 forks source link

Guest Customization (Metadata) is broken - util.py[WARNING]: Broken config drive: /dev/sr0 #303

Open brunobenchimol opened 3 years ago

brunobenchimol commented 3 years ago

Nutanix Cluster Information

AOS: 5.20 LTS Prism Central: pc.2021.9

Terraform Version

Terraform v1.0.9 on linux_amd64

Affected Resource(s)

Terraform Configuration Files

resource "nutanix_virtual_machine" "vm" {
  name                 = "${local.vm_prefix_name}-${count.index}"
  count                = local.vm_instance_count

  cluster_uuid         = data.nutanix_cluster.cluster.id
  num_sockets          = var.vm_cpu
  memory_size_mib      = var.vm_memory

  boot_type = "UEFI"

  disk_list {
    data_source_reference = {
      kind = "image"
      uuid = data.nutanix_image.image.id
    }
  }

 guest_customization_cloud_init_meta_data = base64encode(jsonencode(data.template_file.metadata.rendered))
 guest_customization_cloud_init_user_data = base64encode(data.template_file.userdata.rendered)

  nic_list {
    subnet_uuid = data.nutanix_subnet.subnet.id
  }
}
instance_id: testvm2
hostname: testvm2
local-hostname: testvm2
fqdn: testvm2.example.local
preserve_hostname: false

network:
  version: 2
  ethernets:
    ens3:
      dhcp4: false 
      addresses:
        - 10.1.1.11/24 
      gateway4: 10.1.1.254
      nameservers:
        search: [ "example.local"] 
        addresses: [ "8.8.8.8" , "1.1.1.1" ]          
#cloud-config 

user: root
password: password
chpasswd: {expire: False}
ssh_pwauth: True
ssh_authorized_keys:
 - ssh-rsa AAAAB3NzaC1yc2EAAAA31098390128390128309128dPByi0AZUDOyk/tGoMOPcGgj5IBq+eXv5u+PEsJuEGG0tb+hy0oPBeiIrE7LPlMbQL6lj6K4l+78VR79VyVS2g9U2VtUaI45sn6NO+Grh0WvlLUDDQBtxlfPcwDTXt10tC7izkI+4kGCitxSNG6+xt11xJgFZ03vQFP2U1Hfu9NEoyLEIPzNn3nDnKFQIDLbvGlPW9sY6jxH2XT1bD5AmQv8ZT8QOg6x1T8Gdyt7oNZ30c+2TbJd4HrEK8Q6ZzJeS37DA6KiJpwq9Q8z3ucuzd3+/phbISz5s11VS8/7UPzd1CbU1kxSwhEDoyput2F0teL/+h1DCkde3QMXmDtStuzujmEdCBC0VWuNECdohHaPHjNLZF0CiLYTL/nUQn7X1QWSH/rsABUqMBw2hn0s6zDPbVSTqHTxKqcTBj8nejSvsQnF0gEGimVGFYlRWlx3WJScbU5YgIYee8Gir9XW3tAEdzJymfJgzKirM=

disable_root: false #Enable root access
#ssh_pwauth: yes #Use pwd to access (otherwise follow official doc to use ssh-keys)

runcmd:
  - mkdir -p /mnt/kickstart2
  - mkdir -p /mnt/cloud-init-ok

final_message: "The system is prepped, after $UPTIME seconds"

power_state:
  timeout: 30
  mode: reboot

Expected Behavior

Instance Metadata works as expected (https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html) and configure "network".

Actual Behavior

Breaks cloud-init and configures nothing yielding error: util.py[WARNING]: Broken config drive: /dev/sr0

Steps to Reproduce

  1. terraform apply

Important Factors

  1. Running against Prism Central
  2. If you use just "userdata" (guest_customization_cloud_init_user_data) it works as expected, but if you use guest_customization_cloud_init_meta_data it breaks everything and does not run anything.
  3. Prism UI Interface does not have space for "metadata", only custom script (user-data). ​

    References

Similar issue. It should not needed to jsencode() metadata. Should be base64encode as most of providers implementations.

pipoe2h commented 2 years ago

Hi @brunobenchimol,

Few things here:

brunobenchimol commented 2 years ago

Hello @pipoe2h.

Thanks for helping. I would like to add more info to help.

I did some tweaks using user-data with write_files:

#cloud-config
...

write_files:
- content: |
    NAME=${interface}
    DEVICE=${interface}
    ONBOOT=yes
    TYPE=Ethernet
    %{ if dhcp == true }BOOTPROTO=dhcp
    %{ else }BOOTPROTO=none
    IPADDR=${ip_address}
    PREFIX=${netmask}
    GATEWAY=${gateway}
    %{ if dns1 != "" ~}DNS1=${dns1}${"\n"}%{ endif ~}
    %{ if dns2 != "" ~}DNS2=${dns2}${"\n"}%{ endif ~}
    %{ if dns3 != "" ~}DNS3=${dns3}${"\n"}%{ endif ~}
DEFROUTE=yes
    IPV4_FAILURE_FATAL=no
    IPV6INIT=no %{ endif }
  path: /etc/sysconfig/network-scripts/ifcfg-${interface}
  owner: root:root
  permissions: '0644'
...

Its like hard-coded for CentOS/RHEL since they do not ship/support netplan. Netplan is working on Debian/Ubuntu. But my main point is to support network-config using Cloud-Init (https://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v2.html)

Cloud-init also supports legacy formats aswell (ifup/ifdown). From my testing Cloud-Init reads network-config using "metadata" and not "userdata". It looks like cloud-init is designed to "read" network config from a separate input other than user_data. Also i could not make it work when using user_data (testing on VM, not TF with PE).

Regarding to Nutanix, i tried using Prism Element to test it using Web, but only could input user_data, but i am really into using directly the API yet ,but API i was looking into https://www.nutanix.dev/reference/prism_element/v2/api/vms/post-vms-createvm and could not find meta_data on vm_customization_config object.

I do not know if you guys did not support this because you already have IPAM built-in on AOS but it would surely make flexible and compatible with cloud-init. Currently the only issue it bogs me down is that i cannot create virtual machines with static ip address this way.

if Nutanix does not support network-config atm, what do we work with meta_data on vm customization?! When using Prism is not avaliable on the UI.

Where does https://registry.terraform.io/providers/nutanix/nutanix/latest/docs/data-sources/virtual_machine#guest_customization_cloud_init_meta_data into the API or which files are generated? Kind got lost on the documentation vs API vs cloud-init.

guest_customization_cloud_init_meta_data - The contents of the meta_data configuration for cloud-init. This can be formatted as YAML or JSON. The value must be base64 encoded.

Thanks for helping out!

Best regards,

pipoe2h commented 2 years ago

@brunobenchimol my understanding about network_config is that it requires a separated file and it doesn't use meta_data.json. It just has a reference to the file with the format "network_config": { "content_path": "/content/0000" }

We use the OpenStack implementation with ConfigDrive, so for using network version 2 configuration you need to create an additional file called network_data.json along with meta_data.json and user-data, but as mentioned, we don't have that implementation, you can only pass via PC v3 API user-data and meta_data.json

meta_data.json is not meant for end-users use but rather for integrations coming from other software. This is the reason to not be exposed in the UI and only available in the PC v3 API.

Here is a screenshot of a VM that I have created via PC v3 API passing user-data and meta_data (base64encoded JSON payload that has to include instance-id and uuid)

image

We have customers using static IP configuration just with user-data. If you need an example of this, please let me know.

brunobenchimol commented 2 years ago

I would like that example so i can test it out on my enviroment and get back to you. I am not quite sure my "workaround" using user-data and write_files was the best approach to that issue.

I also did the following to set hostname because it was also on network_config

runcmd:
  - echo -n > /etc/machine-id
  %{ if dhcp != true }- hostnamectl set-hostname ${hostname}.${domain} 
  %{ endif }

I will also try to spin up the test enviroment on my next free time to add more information here after you send an example.

Best regards,

pipoe2h commented 2 years ago

Here is an example that we use with our Calm automation software.

#cloud-config
package_upgrade: false
hostname: @@{name}@@
fqdn: @@{name}@@.@@{domain}@@
manage_etc_hosts: true
ssh_pwauth: true

write_files:
  - path: /etc/sysconfig/network-scripts/ifcfg-ens3
    content: |
      DEVICE="ens3"
      IPADDR="@@{ipaddr}@@"
      NETMASK="@@{netmask}@@"
      GATEWAY="@@{gateway}@@"
      BOOTPROTO="none"
      ONBOOT="yes"
      TYPE="Ethernet"

runcmd:
  - [ifdown, ens3]
  - [ifup, ens3]

manage_resolv_conf: true
resolv_conf:
  nameservers: ['@@{dns1}@@']
  searchdomains:
    - @@{domain}@@
  domain: @@{domain}@@
  options:
    rotate: true
    timeout: 1
pipoe2h commented 2 years ago

Another one. It depends on the Linux distribution and version.


#cloud-config
password: changeme
disable_root: False
ssh_pwauth: True
users:
  - name: "admin"
    groups: sudo
    shell: /bin/bash
    lock_passwd: false
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
chpasswd:
 list: |
   admin:changeme
 expire: False
write_files:
  - path: /etc/sysconfig/network-scripts/ifcfg-eth0
    content: |
      DEVICE=eth0
      TYPE=Ethernet
      ONBOOT=yes
      BOOTPROTO=none
      IPADDR=192.168.0.100
      NETMASK=255.255.255.0
      GATEWAY=192.168.0.1
      DOMAIN=example.com
      DNS1=192.168.0.1
runcmd:
  - systemctl disable NetworkManager kdump
  - systemctl stop NetworkManager kdump
  - systemctl enable network.service
  - systemctl restart network
brunobenchimol commented 2 years ago

Hello back. I did make it work with your scripts. A slightly modified version to pick up variables from Terraform instead of Calm syntax.

I also was fiddling around with RHEL8 (which does not support netplan) on another scenario (different hypervisor - ESXi) and updated cloud-init to use lastest plugins. I know it wont apply to terraform nutanix provider but is still some testing with cloud-init itself.

[root@vm-app-84m0z-0 ~]# cloud-init query ds
{
 "_doc": "EXPERIMENTAL: The structure and format of content scoped under the 'ds' key may change in subsequent releases of cloud-init.",
 "meta_data": {
  "hostname": "vm-app-84m0z-0",
  "instance_id": "vm-app-84m0z-0",
  "local_hostname": "vm-app-84m0z-0",
  "local_ipv4": "10.1.1.224",
  "local_ipv6": "fe80::250:56ff:feb6:496b%ens192",
  "network": {
   "config": {
    "ethernets": {
     "ens192": {
      "addresses": [
       "10.1.1.224/24"
      ],
      "dhcp4": false,
      "gateway4": "10.1.1.254",
      "nameservers": {
       "addresses": [
        "1.1.1.1",
        "8.8.8.8"
       ],
       "search": [
        "example.local"
       ]
      }
     }
    },
    "version": 2
   },
   "interfaces": {
    "by_ipv4": {
     "10.1.1.224": {
      "broadcast": "10.1.1.255",
      "mac": "00:50:56:b6:49:6b",
      "netmask": "255.255.255.0"
     }
    },
    "by_ipv6": {},
    "by_mac": {
     "00:50:56:b6:49:6b": {
      "ipv4": [
       {
        "addr": "10.1.1.224",
        "broadcast": "10.1.1.255",
        "netmask": "255.255.255.0"
       }
      ],
      "ipv6": []
     }
    }
   }
  },
  "preserve_hostname": false
 }
}

I do not have an OpenStack to spin off some tests and check more details.

Merry Xmas!

brunobenchimol commented 2 years ago

Sorry forgot to mention the plugin (https://cloudinit.readthedocs.io/en/21.4/topics/datasources/vmware.html) or if running older cloud-init releases (https://github.com/vmware-archive/cloud-init-vmware-guestinfo).

I also tested on a Ubuntu 20.04 release (Debian bullseye/sid) which comes with:

root@vm-app-3wtri-0:~# cloud-init --version
/usr/bin/cloud-init 21.3-1-g6803368d-0ubuntu1~20.04.4

Its quite different hypervisors but same cloud-init meta_data {} field.

premkarat commented 2 years ago

This could be an API issue, so need to wait for the fix from API side.