ionos-cloud / cluster-api-provider-proxmox

Cluster API Provider for Proxmox VE (CAPMOX)
Apache License 2.0
190 stars 24 forks source link

Machine gets stuck in provisioning state #291

Open khushalmer03 opened 1 month ago

khushalmer03 commented 1 month ago

What steps did you take and what happened:


- I'm trying to deploy single controlPlane cluster using proxmox vm template with talos initialized in it. Below is my cluster manifest that I'm using to provision talos cluster in proxmox vm

apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: talos-test spec: clusterNetwork: pods: cidrBlocks:


apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 kind: ProxmoxCluster metadata: name: pride spec: controlPlaneEndpoint: host: "10.0.1.164" port: 6443 ipv4Config: addresses: [10.0.1.174-10.0.1.175] prefix: 20 gateway: 10.0.1.1 dnsServers: [10.0.1.1] allowedNodes: [px1] credentialsRef: name: "pride-proxmox-credentials"


apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane metadata: name: talos-test spec: version: v1.30.1 replicas: 1 infrastructureTemplate: kind: ProxmoxMachineTemplate apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 name: talos-cp controlPlaneConfig: init: generateType: init controlplane: generateType: controlplane talosVersion: v1.7.4


apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 kind: ProxmoxMachineTemplate metadata: name: "talos-cp" spec: template: spec: sourceNode: "px1" templateID: 110 format: "qcow2" full: true numSockets: 1 numCores: 2 memoryMiB: 2048 disks: bootVolume: disk: scsi0 sizeGb: 8 network: default: bridge: vmbr0 model: virtio


Now the machine is created successfully in proxmox with IP assigned to it as well but machine phase in the management cluster is stuck in provisioning state as a result no further action of bootstrapping by talos takes place as it keeps waiting for infrastructure to be ready. Upon cheking the logs of capmox-controller-manager, this is what I found: 

E0923 08:06:57.505071 1 controller.go:329] "Reconciler error" err="failed to reconcile VM: error waiting for agent: the operation has timed out" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="talos-cp2-n9h6b" name="talos-cp2-n9h6b" reconcileID="f55e43c1-ba7d-43f5-bbee-8c1fd90b3202"


Note that I have already enables qemu-agent in the VM template as well. 

Status of machine in management cluster: 

![image](https://github.com/user-attachments/assets/c1a81206-b838-4115-8dad-abbdf2c1589f)

**What did you expect to happen:**
Machine is provisioned successfully and control plane is initialized.

**Environment:**

- Cluster-api-provider-proxmox version:v0.5.1 
- Kubernetes version: (use `kubectl version`):v1.30.1
- OS (e.g. from `/etc/os-release`): talos:v1.7.4
mcbenjemaa commented 3 days ago

Please share the cluster manifest, you're using and let us debug