ionos-cloud / cluster-api-provider-proxmox

Cluster API Provider for Proxmox VE (CAPMOX)
Apache License 2.0
181 stars 24 forks source link

cloud-init iso inject failed #225

Open abrahamhwj opened 3 months ago

abrahamhwj commented 3 months ago

What steps did you take and what happened: [A clear and concise description of what the bug is.] The error message when using the Proxmox provider to create a cluster suggests: failed to inject cloud-init ISO.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] The operator error log: "Reconciler error" err="failed to reconcile VM: cloud-init iso inject failed: unable to inject CloudInit ISO: bad request: 400 Parameter verification failed. - {\"boot\":\"invalid format - format error\nboot.legacy: value does not match the regex pattern\n\"}" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/proxmox-quickstart-control-plane-rqxgq" namespace="default" name="proxmox-quickstart-control-plane-rqxgq" reconcileID="8ca71cce-ba7c-4921-a7db-171a37a5a62b" image

Environment: VM Template script: qm create 9000 --memory 4096 --cores 4 --net0 virtio,bridge=vmbr0 qm disk import 9000 CentOS-7-x86_64-GenericCloud.qcow2c local-lvm qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0 qm set 9000 --ide2 local-lvm:cloudinit qm set 9000 --boot c --bootdisk scsi0 qm set 9000 --serial0 socket --vga serial0 qm template 9000

clusterctl.yaml providers:

PROXMOX_URL: "https://xxxx:8006" # The Proxmox VE host PROXMOX_TOKEN: "capmox@pve!capi" # The Proxmox VE TokenID for authentication PROXMOX_SECRET: "xxxxxx" # The secret associated with the TokenID

PROXMOX_SOURCENODE: "hwj" # The node that hosts the VM template to be used to provision VMs TEMPLATE_VMID: "9000" # The template VM ID used for cloning VMs ALLOWED_NODES: "[hwj]" # The Proxmox VE nodes used for VM deployments VM_SSH_KEYS: "ssh-ed25519 xxxxx" # The ssh authorized keys used to ssh to the machines.

CONTROL_PLANE_ENDPOINT_IP: "192.168.3.210" # The IP that kube-vip is going to use as a control plane endpoint NODE_IP_RANGES: "[192.168.3.211-192.168.3.220]" # The IP ranges for Cluster nodes GATEWAY: "192.168.3.1" # The gateway for the machines network-config. IP_PREFIX: "24" # Subnet Mask in CIDR notation for your node IP ranges DNS_SERVERS: "[192.168.3.1]" # The dns nameservers for the machines network-config. BRIDGE: "vmbr0" # The network bridge device for Proxmox VE VMs cat

BOOT_VOLUME_DEVICE: "scsi0" # The device used for the boot disk. BOOT_VOLUME_SIZE: "80" # The size of the boot disk in GB. NUM_SOCKETS: "1" # The number of sockets for the VMs. NUM_CORES: "4" # The number of cores for the VMs. MEMORY_MIB: "4096" # The memory size for the VMs.

EXP_CLUSTER_RESOURCE_SET: "true" # This enables the ClusterResourceSet feature that we are using to deploy CNI CLUSTER_TOPOLOGY: "true" # This enables experimental ClusterClass templating

pborn-ionos commented 3 months ago

This is most likely because your args to the boot parameter qm set ... --boot c fail Proxmox' validation (boot.legacy: value does not match the regex pattern).

According to https://pve.proxmox.com/pve-docs/qm.1.html, simply c is not valid:

--boot [[legacy=]<[acdn]{1,4}>] [,order=<device[;device...]>]
Specify guest boot order. Use the order= sub-property as usage with no key or legacy= is deprecated.

Based on that, I would guess --boot legacy=c,order=scsi0 is what you want to achieve. But it's not a CAPMOX issue.

abrahamhwj commented 3 months ago

This is most likely because your args to the boot parameter qm set ... --boot c fail Proxmox' validation (boot.legacy: value does not match the regex pattern).

According to https://pve.proxmox.com/pve-docs/qm.1.html, simply c is not valid:

--boot [[legacy=]<[acdn]{1,4}>] [,order=<device[;device...]>]
Specify guest boot order. Use the order= sub-property as usage with no key or legacy= is deprecated.

Based on that, I would guess --boot legacy=c,order=scsi0 is what you want to achieve. But it's not a CAPMOX issue.

Thank you for your response. The issue was indeed resolved after modifying the command to "qm set 9000 --boot --bootdisk scsi0". Additionally, under CentOS 7.9, executing the cloud-init status causes a crash (this seems to be a bug in the go-proxmox /virtual_machine.go file, which has been fixed in the main branch but not yet released). For the time being, I switched to an Ubuntu image, which allows for normal execution. However, it seems to be continuously stuck in the current status. I've checked the logs of several relevant containers and haven't noticed any anomaly alerts.

image
mcbenjemaa commented 3 months ago

@abrahamhwj Please check the created machine cloud-init status

abrahamhwj commented 3 months ago

@abrahamhwj Please check the created machine cloud-init status Thank you for your response. Initial inspection shows that cloud-init is working correctly: In the Web interface, cloud-init works fine, and on the PVE host, inputting qm guest exec 102 -- /usr/bin/cloud-init status responds as follows: { "exitcode" : 0, "exited" : 1, "out-data" : "status: done\n" } PS: The current integrated version introduces a crash in luthermonson/go-proxmox/virtual_machine.go/AgentExec when cloud-init is not ready. This issue is fixed in the main branch but has not been released yet.

mcbenjemaa commented 3 months ago

We will try to get it fixed.

mcbenjemaa commented 3 weeks ago

@abrahamhwj would you be able to help me reproduce this?