ionos-cloud / cluster-api-provider-proxmox

Cluster API Provider for Proxmox VE (CAPMOX)
Apache License 2.0
149 stars 19 forks source link

Support cluster api operator #221

Open abrahamhwj opened 1 month ago

abrahamhwj commented 1 month ago

Describe the solution you'd like [A clear and concise description of what you want to happen.] Support cluster api operator, Install PVE provider with InfrastructureProvider CRD without clusterctl tool. If already supported, hope to update the document to guide how to operate. Currently the cluster api operator doc with a link to PVE provider doc, but this doc only for clusterctl

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

mcbenjemaa commented 1 month ago

Thanks for addressing this. If you want, you can work on this

isZumpo commented 1 month ago

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

pborn-ionos commented 3 weeks ago

@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.

abrahamhwj commented 3 weeks ago

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.

isZumpo commented 3 weeks ago

@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.

Sure, let us see if we can put something together for that. Suppose it might be best to start here in the chat and then based on how it goes for @abrahamhwj write some documentation about it :)

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.

Sure, highly recommend using the cluster API operator, it is very nice having everything as YAML files in your gitops repository rather than having to execute clusterctl commands. I am using the cluster API operator helm chart to deploy the cluster API operator using argocd. Will give you the whole thing:

Chart.yaml

....
dependencies:
- name: cluster-api-operator
  version: 0.10.1
  repository: https://kubernetes-sigs.github.io/cluster-api-operator

values.yaml

cluster-api-operator:
  core: "cluster-api:v1.7.1"
  controlPlane: "kubeadm:v1.4.2"
  bootstrap: "kubeadm:v1.4.2"
  manager:
    featureGates:
      kubeadm:
        EXP_CLUSTER_RESOURCE_SET: true
        ClusterTopology: true
      core:
        ClusterTopology: true

templates/proxmox-infrastructure

apiVersion: v1
kind: Namespace
metadata:
  name: proxmox-infrastructure-system
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: proxmox-variables
  namespace: proxmox-infrastructure-system
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: akeyless-secret-store
  target:
    name: proxmox-variables
    creationPolicy: Owner
  dataFrom:
  - extract:
      key: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: proxmox
 namespace: proxmox-infrastructure-system
spec:
 version: v0.4.0
 configSecret:
   name: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: IPAMProvider
metadata:
 name: in-cluster
 namespace: proxmox-infrastructure-system
spec:
 version: v0.1.0

In my setup, I am using the external secrets operator to generate the secret named proxmox-variables, containing the required variables to setup the proxmox operator. If you don't use external secrets you can just create it manually instead, it should look like this in the end:

PROXMOX_URL: "https://pve.example:8006"                       # The Proxmox VE host
PROXMOX_TOKEN: "root@pam!capi"                                # The Proxmox VE TokenID for authentication
PROXMOX_SECRET: "REDACTED"                                    # The secret associated with the TokenID

My setup also contains the IPAMProvider, I had issues running without it.

Now with this setup you should be able to deploy your cluster objects

abrahamhwj commented 3 weeks ago

@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:

image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

isZumpo commented 3 weeks ago

@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:

image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

Try taking a look at the logs of the different mentioned managers. I have found especially the logs of capmox-controller-manager to be very valuable.

abrahamhwj commented 3 weeks ago

@isZumpo Logs capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager:

image

“Failed to watch *v1beta1.MachinePool” I did not create machinePool resource, so I ignored the error "Could not connect to workload cluster to fetch status" before cluster initialization, I think this error is normal?

Logs capmox-system/capmox-controller-manager:

image

Logs capi-ipam-in-cluster-system/capi-ipam-in-cluster-controller-manager

image

Logs capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager

image

cloud-init appears to be functioning normally, but the IP address and DNS configuration of the VM are not taking effect.

image
65278 commented 2 weeks ago

@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows: image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

Since the control plane is waiting for KubeAdmInit, it's likely that your virtual machines have no networking (at least towards cluster api). capi-kubeadm-control-plane-controller-manager tells you: Get \"https://192.168.3.220:6443/api/v1?timeout=10s\": dial tcp 192.168.3.220:6443: connect: no route to host". Please add a route from your cluster-api host to the subnet containing 192.168.3.220, otherwise KubeAdmInit can't finish. In general, cluster-api can not deploy a cluster without having a route to that cluster.

abrahamhwj commented 2 weeks ago

@65278 If the IP 192.168.3.220 is configured, it should be able to communicate with the VM where the cluster API is located since they are all under the same router and in the same subnet, as follows: PVE host: 192.168.3.200 Cluster API host: 192.168.3.201 VIP: 192.168.3.220 VM: 192.168.3.221~230 Gateway: 192.168.3.1 Prefix: 24 From the status of the VMs, it seems that the network configuration of the VMs was not correctly initialized by Cloud-Init. The VMs were not configured with IP addresses, but I don't know what caused this issue and didn't see any related error logs.

65278 commented 2 weeks ago

That's always the most difficult to debug part. cloud-init does write error messages to console, but they'll not be very specific. Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then try netplan apply and see what error messages pop up. In general, we only support netplan api v2 with passthrough. Simple configurations for cloud-init may work, but we haven't tried them at all. One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.

abrahamhwj commented 2 weeks ago

@65278 Thank you for your reply. I attempted to manually configure the IP and account password via CLI commands on the PVE Host, and it successfully allowed me to log in. After configuring the address, I was able to ping it from the host where the cluster API resides, which suggests that the network configuration is likely correct. As for the netplan API v2, I haven't had experience with it before, so I may need to familiarize myself with it first to be certain.

65278 commented 2 weeks ago

Make a template that has netplan installed, and cloud-init should do the right thing: https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html#networking-config-version-2 We've got an open ticket about more cloud-init network rendering (talos is incompatible for example). We have no opportunity to test this at the moment, but we have an issue for it: https://github.com/ionos-cloud/cluster-api-provider-proxmox/issues/94 You can contribute a working cloud-init without netplan renderer if you like.

abrahamhwj commented 1 week ago

That's always the most difficult to debug part. cloud-init does write error messages to console, but they'll not be very specific. Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then try netplan apply and see what error messages pop up. In general, we only support netplan api v2 with passthrough. Simple configurations for cloud-init may work, but we haven't tried them at all. One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.

I reviewed some of CAPMOX's code and documentation on how Cloud-init works. Based on troubleshooting my test environment, the reason could be as follows:

  1. The CD-ROM injected by CAPMOX is at '/dev/sr0'. When PVE enables Cloud-init, the CD-ROM it injects is at '/dev/sr1'.
  2. During system startup, Cloud-init always reads from /dev/sr1 first. This causes the injected configuration by CAPMOX not to be executed by Cloud-init. Therefore, CAPMOX indicates that the node and cluster status are READY, but in reality, there is no effective configuration on the virtual machine. PVE:8.2.2 OS:Ubuntu Server 20.04 LTS

I am very grateful for the CAPMOX project and everyone's enthusiastic responses. I have learned a lot about Cluster API, PVE, and Cloud-init. Although I would love to contribute, I am just an ordinary user. I can do some testing or walk through some simple code, but I don't have much experience in code development.

If you have any test suggestions, you can let me know and I will be happy to try them.

mcbenjemaa commented 1 week ago

You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox, Otherwise, that will overwrite the config of CAPMOX. No need to pre-set up the Cloud-init Drive. Just use an empty CD ROM at ide0, and CAPMOX will do the job.

abrahamhwj commented 1 week ago

You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox, Otherwise, that will overwrite the config of CAPMOX. No need to pre-set up the Cloud-init Drive. Just use an empty CD ROM at ide0, and CAPMOX will do the job.

Thank you for your Response

Should the Virtual Machine Template be Preconfigured with the K8S Deployment Environment, Such as Installing containerd, kubeadm, kubectl, kubelet etc.? I couldn't find the related scripts.

If these are not prepared, cloud-init initialization will fail and reconcile stoped.

mcbenjemaa commented 1 week ago

@abrahamhwj Yes, you will need to build a VM template first. as stated in our docs: https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/main/docs/Usage.md#dependencies