Open khushalmer03 opened 2 months ago
Please share the cluster manifest, you're using and let us debug
I have the same issue I think. Relevant capmox logs:
E1122 21:34:46.708828 1 proxmoxmachine_controller.go:209] "error reconciling VM" err="unable to get cloud-init status: no pid returned from agent exec command" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="embla-cluster/embla-proxmox-control-plane-machine-template-8tt4r" namespace="embla-cluster" name="embla-proxmox-control-plane-machine-template-8tt4r" reconcileID="b447c5f3-11b6-41bc-a800-b7a75182246e" machine="embla-cluster/embla-talos-control-plane-qfhts" cluster="embla-cluster/embla-cluster"
E1122 21:34:46.709446 1 controller.go:329] "Reconciler error" err="failed to reconcile VM: unable to get cloud-init status: no pid returned from agent exec command" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="embla-cluster/embla-proxmox-control-plane-machine-template-8tt4r" namespace="embla-cluster" name="embla-proxmox-control-plane-machine-template-8tt4r" reconcileID="b447c5f3-11b6-41bc-a800-b7a75182246e"
Possibly related to #290? This specifically mentions talos, but seems like this crd version is not yet released. Any chance we could soon get a release with these fixes so we can deploy this with talos?
Edit:
I switched to using a cloud-init compatible talos image, however it seems like the cloud-init config is crashing talos:
Seems to be an issue in talos: https://github.com/siderolabs/talos/pull/9352
Can confirm that the issue I mentioned has been solved in talos 1.9 alpha.3. The only remaining issue I see is that capmox is not updating the node IPs in the machine CR which cause talos to wait with bootstrapping. This can be solved with the skipQemuCheck in the proxmoxmachine CR but this has not yet been released, it's on main only.
@rouke-broersma Can you share working manifests ?
@rouke-broersma Can you share working manifests ?
https://github.com/broersma-forslund/homelab/tree/main/apps%2Finfrastructure
What steps did you take and what happened:
apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: talos-test spec: clusterNetwork: pods: cidrBlocks:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 kind: ProxmoxCluster metadata: name: pride spec: controlPlaneEndpoint: host: "10.0.1.164" port: 6443 ipv4Config: addresses: [10.0.1.174-10.0.1.175] prefix: 20 gateway: 10.0.1.1 dnsServers: [10.0.1.1] allowedNodes: [px1] credentialsRef: name: "pride-proxmox-credentials"
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane metadata: name: talos-test spec: version: v1.30.1 replicas: 1 infrastructureTemplate: kind: ProxmoxMachineTemplate apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 name: talos-cp controlPlaneConfig: init: generateType: init controlplane: generateType: controlplane talosVersion: v1.7.4
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 kind: ProxmoxMachineTemplate metadata: name: "talos-cp" spec: template: spec: sourceNode: "px1" templateID: 110 format: "qcow2" full: true numSockets: 1 numCores: 2 memoryMiB: 2048 disks: bootVolume: disk: scsi0 sizeGb: 8 network: default: bridge: vmbr0 model: virtio
E0923 08:06:57.505071 1 controller.go:329] "Reconciler error" err="failed to reconcile VM: error waiting for agent: the operation has timed out" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="talos-cp2-n9h6b" name="talos-cp2-n9h6b" reconcileID="f55e43c1-ba7d-43f5-bbee-8c1fd90b3202"