I am building a cluster k8s with 3 cp nodes and 3 worker nodes on a 3 node proxmox cluster. There are 3 possible proxmox urls that can be used one on each node.
I can build a cluster successfully if all k8s nodes are placed on the proxmox node which is used for the api url
If I build a k8s cluster with k8s nodes spread across the proxmox cluster (using ALLOWED_NODES en var) then k8s nodes will only build successfully on the cluster that hosts the api uri. The other nodes fail to build.
For example if https://node1.domain.cloud:8006/api2/json/ is used for the api url then a control plane and worker node will successfully build on node1. node2 and node3 will have a worker node which has cloned and has the ip tag against it.
In the capmox controller logs the following error is logged repeatedly:
E0214 13:26:35.944055 1 controller.go:329] "Reconciler error" err="failed to reconcile VM: cloud-init iso inject failed: unable to inject CloudInit ISO: Post \"https://node1.domain.cloud:8006/api2/json/nodes/node2/storage/local/upload\": EOF" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/capi-management-control-plane-xfv5m" namespace="default" name="capi-management-control-plane-xfv5m" reconcileID="136fe780-6b15-49b4-b6c1-bda537915f48"
In proxmox there will be 1 Successful resize task and 2 failed Copy data tasks which will then keep repeating. The error given is:
starting file import from: /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7
target node: node2
target file: /var/lib/vz/template/iso/user-data-107.iso
file size is: 65536
command: /usr/bin/scp -o BatchMode=yes -p -- /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7 [10.20.1.22]:/var/lib/vz/template/iso/user-data-107.iso
TASK ERROR: import failed: /usr/bin/scp: stat local "/var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7": No such file or directory
If you log onto the node you can see the pveupload-* file being created and then being removed. Something is removing this file before the copy can take place.
The contents of the pveupload file looks to be correct
What did you expect to happen:
The nodes should be built successfully across all 3 proxmox nodes.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
3 proxmox nodes 8.1.3 which also host ceph cluster used for storage
clusterctl version 1.6.1
IPAM provider v0.1.0-alpha.3
What steps did you take and what happened:
I am building a cluster k8s with 3 cp nodes and 3 worker nodes on a 3 node proxmox cluster. There are 3 possible proxmox urls that can be used one on each node. I can build a cluster successfully if all k8s nodes are placed on the proxmox node which is used for the api url If I build a k8s cluster with k8s nodes spread across the proxmox cluster (using ALLOWED_NODES en var) then k8s nodes will only build successfully on the cluster that hosts the api uri. The other nodes fail to build. For example if https://node1.domain.cloud:8006/api2/json/ is used for the api url then a control plane and worker node will successfully build on node1. node2 and node3 will have a worker node which has cloned and has the ip tag against it. In the capmox controller logs the following error is logged repeatedly:
In proxmox there will be 1 Successful resize task and 2 failed Copy data tasks which will then keep repeating. The error given is:
If you log onto the node you can see the pveupload-* file being created and then being removed. Something is removing this file before the copy can take place. The contents of the pveupload file looks to be correct
What did you expect to happen: The nodes should be built successfully across all 3 proxmox nodes.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment: 3 proxmox nodes 8.1.3 which also host ceph cluster used for storage clusterctl version 1.6.1 IPAM provider v0.1.0-alpha.3
kubectl version
): 1.28.3/etc/os-release
): Ubuntu 22.04.3 LTS