Open abrahamhwj opened 1 month ago
Thanks for addressing this. If you want, you can work on this
I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?
@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.
I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?
If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.
@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.
Sure, let us see if we can put something together for that. Suppose it might be best to start here in the chat and then based on how it goes for @abrahamhwj write some documentation about it :)
I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?
If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.
Sure, highly recommend using the cluster API operator, it is very nice having everything as YAML files in your gitops repository rather than having to execute clusterctl commands. I am using the cluster API operator helm chart to deploy the cluster API operator using argocd. Will give you the whole thing:
Chart.yaml
....
dependencies:
- name: cluster-api-operator
version: 0.10.1
repository: https://kubernetes-sigs.github.io/cluster-api-operator
values.yaml
cluster-api-operator:
core: "cluster-api:v1.7.1"
controlPlane: "kubeadm:v1.4.2"
bootstrap: "kubeadm:v1.4.2"
manager:
featureGates:
kubeadm:
EXP_CLUSTER_RESOURCE_SET: true
ClusterTopology: true
core:
ClusterTopology: true
templates/proxmox-infrastructure
apiVersion: v1
kind: Namespace
metadata:
name: proxmox-infrastructure-system
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: proxmox-variables
namespace: proxmox-infrastructure-system
spec:
secretStoreRef:
kind: ClusterSecretStore
name: akeyless-secret-store
target:
name: proxmox-variables
creationPolicy: Owner
dataFrom:
- extract:
key: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: proxmox
namespace: proxmox-infrastructure-system
spec:
version: v0.4.0
configSecret:
name: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: IPAMProvider
metadata:
name: in-cluster
namespace: proxmox-infrastructure-system
spec:
version: v0.1.0
In my setup, I am using the external secrets operator to generate the secret named proxmox-variables, containing the required variables to setup the proxmox operator. If you don't use external secrets you can just create it manually instead, it should look like this in the end:
PROXMOX_URL: "https://pve.example:8006" # The Proxmox VE host
PROXMOX_TOKEN: "root@pam!capi" # The Proxmox VE TokenID for authentication
PROXMOX_SECRET: "REDACTED" # The secret associated with the TokenID
My setup also contains the IPAMProvider, I had issues running without it.
Now with this setup you should be able to deploy your cluster objects
@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:
capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.
Do you have any suggestions?
@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:
capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.
Do you have any suggestions?
Try taking a look at the logs of the different mentioned managers. I have found especially the logs of capmox-controller-manager to be very valuable.
@isZumpo Logs capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager:
“Failed to watch *v1beta1.MachinePool” I did not create machinePool resource, so I ignored the error "Could not connect to workload cluster to fetch status" before cluster initialization, I think this error is normal?
Logs capmox-system/capmox-controller-manager:
Logs capi-ipam-in-cluster-system/capi-ipam-in-cluster-controller-manager
Logs capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager
cloud-init appears to be functioning normally, but the IP address and DNS configuration of the VM are not taking effect.
@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:
capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.
Do you have any suggestions?
Since the control plane is waiting for KubeAdmInit
, it's likely that your virtual machines have no networking (at least towards cluster api). capi-kubeadm-control-plane-controller-manager
tells you: Get \"https://192.168.3.220:6443/api/v1?timeout=10s\": dial tcp 192.168.3.220:6443: connect: no route to host"
.
Please add a route from your cluster-api
host to the subnet containing 192.168.3.220
, otherwise KubeAdmInit
can't finish.
In general, cluster-api
can not deploy a cluster without having a route to that cluster.
@65278 If the IP 192.168.3.220 is configured, it should be able to communicate with the VM where the cluster API is located since they are all under the same router and in the same subnet, as follows: PVE host: 192.168.3.200 Cluster API host: 192.168.3.201 VIP: 192.168.3.220 VM: 192.168.3.221~230 Gateway: 192.168.3.1 Prefix: 24 From the status of the VMs, it seems that the network configuration of the VMs was not correctly initialized by Cloud-Init. The VMs were not configured with IP addresses, but I don't know what caused this issue and didn't see any related error logs.
That's always the most difficult to debug part. cloud-init
does write error messages to console, but they'll not be very specific.
Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then try netplan apply
and see what error messages pop up. In general, we only support netplan api v2
with passthrough. Simple configurations for cloud-init
may work, but we haven't tried them at all.
One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.
@65278 Thank you for your reply. I attempted to manually configure the IP and account password via CLI commands on the PVE Host, and it successfully allowed me to log in. After configuring the address, I was able to ping it from the host where the cluster API resides, which suggests that the network configuration is likely correct. As for the netplan API v2, I haven't had experience with it before, so I may need to familiarize myself with it first to be certain.
Make a template that has netplan installed, and cloud-init should do the right thing: https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html#networking-config-version-2
We've got an open ticket about more cloud-init network rendering (talos is incompatible for example). We have no opportunity to test this at the moment, but we have an issue for it: https://github.com/ionos-cloud/cluster-api-provider-proxmox/issues/94
You can contribute a working cloud-init without netplan
renderer if you like.
That's always the most difficult to debug part.
cloud-init
does write error messages to console, but they'll not be very specific. Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then trynetplan apply
and see what error messages pop up. In general, we only supportnetplan api v2
with passthrough. Simple configurations forcloud-init
may work, but we haven't tried them at all. One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.
I reviewed some of CAPMOX's code and documentation on how Cloud-init works. Based on troubleshooting my test environment, the reason could be as follows:
I am very grateful for the CAPMOX project and everyone's enthusiastic responses. I have learned a lot about Cluster API, PVE, and Cloud-init. Although I would love to contribute, I am just an ordinary user. I can do some testing or walk through some simple code, but I don't have much experience in code development.
If you have any test suggestions, you can let me know and I will be happy to try them.
You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox,
Otherwise, that will overwrite the config of CAPMOX.
No need to pre-set up the Cloud-init Drive.
Just use an empty CD ROM at ide0
, and CAPMOX will do the job.
You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox, Otherwise, that will overwrite the config of CAPMOX. No need to pre-set up the Cloud-init Drive. Just use an empty CD ROM at
ide0
, and CAPMOX will do the job.
Thank you for your Response
Should the Virtual Machine Template be Preconfigured with the K8S Deployment Environment, Such as Installing containerd, kubeadm, kubectl, kubelet etc.? I couldn't find the related scripts.
If these are not prepared, cloud-init initialization will fail and reconcile stoped.
@abrahamhwj Yes, you will need to build a VM template first. as stated in our docs: https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/main/docs/Usage.md#dependencies
Describe the solution you'd like [A clear and concise description of what you want to happen.] Support cluster api operator, Install PVE provider with InfrastructureProvider CRD without clusterctl tool. If already supported, hope to update the document to guide how to operate. Currently the cluster api operator doc with a link to PVE provider doc, but this doc only for clusterctl
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):