ionos-cloud / cluster-api-provider-proxmox

Cluster API Provider for Proxmox VE (CAPMOX)
Apache License 2.0
182 stars 24 forks source link

Issue with CloudInit ISO creation #112

Closed mkamsikad2 closed 7 months ago

mkamsikad2 commented 8 months ago

What steps did you take and what happened:

I am building a cluster k8s with 3 cp nodes and 3 worker nodes on a 3 node proxmox cluster. There are 3 possible proxmox urls that can be used one on each node. I can build a cluster successfully if all k8s nodes are placed on the proxmox node which is used for the api url If I build a k8s cluster with k8s nodes spread across the proxmox cluster (using ALLOWED_NODES en var) then k8s nodes will only build successfully on the cluster that hosts the api uri. The other nodes fail to build. For example if https://node1.domain.cloud:8006/api2/json/ is used for the api url then a control plane and worker node will successfully build on node1. node2 and node3 will have a worker node which has cloned and has the ip tag against it. In the capmox controller logs the following error is logged repeatedly:

E0214 13:26:35.944055       1 controller.go:329] "Reconciler error" err="failed to reconcile VM: cloud-init iso inject failed: unable to inject CloudInit ISO: Post \"https://node1.domain.cloud:8006/api2/json/nodes/node2/storage/local/upload\": EOF" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/capi-management-control-plane-xfv5m" namespace="default" name="capi-management-control-plane-xfv5m" reconcileID="136fe780-6b15-49b4-b6c1-bda537915f48"

In proxmox there will be 1 Successful resize task and 2 failed Copy data tasks which will then keep repeating. The error given is:

starting file import from: /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7
target node: node2
target file: /var/lib/vz/template/iso/user-data-107.iso
file size is: 65536
command: /usr/bin/scp -o BatchMode=yes -p -- /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7 [10.20.1.22]:/var/lib/vz/template/iso/user-data-107.iso
TASK ERROR: import failed: /usr/bin/scp: stat local "/var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7": No such file or directory

If you log onto the node you can see the pveupload-* file being created and then being removed. Something is removing this file before the copy can take place. The contents of the pveupload file looks to be correct

What did you expect to happen: The nodes should be built successfully across all 3 proxmox nodes.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment: 3 proxmox nodes 8.1.3 which also host ceph cluster used for storage clusterctl version 1.6.1 IPAM provider v0.1.0-alpha.3

mkamsikad2 commented 8 months ago

Please withdraw this - been traced to a latency issue