How to use hcloud ccm with CAPH bare metal?

chess-knight commented 4 months ago

I heard that someone from Hetzner successfully used it in the past, but according to my test, there needs to be at least one manual step for Cluster API's happiness. The thing is the providerID, which is different between these two projects for the robot servers and then CAPI cannot pair nodes with machine objects. For more see https://github.com/SovereignCloudStack/cluster-stacks/pull/125#issuecomment-2217241866

guettli commented 4 months ago

Just for the records, this issue tracked the support for bare-metal. Afaik it is implemented:

https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/330

We at Syself have a fork (which supported bare-metal before hcloud-ccm did):

https://github.com/syself/hetzner-cloud-controller-manager

I am not happy with the current situation to have two CCMs. Sooner or later we want to solve that.

chess-knight commented 4 months ago

I know it is implemented, and almost everything works nicely for me. But when creating clusters with CAPH, somehow, the providerID field got messed up. As I wrote in the linked comment - CAPH is setting the providerID field hcloud://bm-$SERVER_NUMBER for the baremetal inframachine objects and hcloud ccm is setting hrobot://$SERVER_NUMBER for the workload k8s nodes. See also https://github.com/SovereignCloudStack/cluster-stacks/pull/125#issuecomment-2236589953

apricote commented 3 months ago

hcloud-cloud-controller-manager supports Robot Servers. We merged the necessary code from syself/hetzner-cloud-controller-manager at the end of last year. You can check out #523 for details on this merge, and the full design doc we have written for it. At that time, the design doc was shared with @batistein.

The design doc has the following considerations for the Provider ID. The implementation matches this plan:

We always need to know which nodes belong to which "source". We can save this info to the ProviderID field. Our existing Cloud servers use the pattern hcloud://. For Robot, we will use hrobot://. This differs from the Syself Fork, they use hcloud://bm-. We will also allow reading the Syself format, to enable users to migrate from the fork to our HCCM.

IMO this new format should be added to CAPH if it wants to work with hcloud-cloud-controller-manager.

I am not happy with the current situation to have two CCMs. Sooner or later we want to solve that.

I was hoping that with the merge, there was no longer a reason for the syself fork and you would migrate your users to hcloud-cloud-controller-manager.

chess-knight commented 3 months ago

Yes, migration is possible for the existing clusters. However, for the new clusters, the provider ID simply differs.

IMO this new format should be added to CAPH if it wants to work with hcloud-cloud-controller-manager.

I think the same for the reasons I wrote above. CAPH also recently updated docs in https://github.com/syself/cluster-api-provider-hetzner/issues/1401 for hcloud clusters. But for the baremetal servers, docs are still pointing to syself fork hccm.

chess-knight commented 3 months ago

I found that the mentioned manual workaround KUBE_EDITOR="sed -i 's#hcloud://bm-#hrobot://#'" kubectl edit hetznerbaremetalmachine works for now, but CAPH csr controller is also using "wrong" ProviderID in case of usage of constant hostnames for baremetal servers. When using this feature, CAPH cannot pair nodes and kubelet-serving CSRs are therefore in a pending state(e.g. kubectl logs/exec/... doesn't work then in the workload cluster). This needs to be also fixed.

apricote commented 2 months ago

As this is a missing feature in cluster-api-provider-hetzner, I have opened an issue on that repository: https://github.com/syself/cluster-api-provider-hetzner/issues/1470

I will close this issue.

Please open a new issue if there are any features missing in hcloud-cloud-controller-manager that would be required for cluster-api-provider-hetzner.

hetznercloud / hcloud-cloud-controller-manager

How to use hcloud ccm with CAPH bare metal? #702