Closed chess-knight closed 2 months ago
Just for the records, this issue tracked the support for bare-metal. Afaik it is implemented:
https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/330
We at Syself have a fork (which supported bare-metal before hcloud-ccm did):
https://github.com/syself/hetzner-cloud-controller-manager
I am not happy with the current situation to have two CCMs. Sooner or later we want to solve that.
I know it is implemented, and almost everything works nicely for me. But when creating clusters with CAPH, somehow, the providerID field got messed up. As I wrote in the linked comment - CAPH is setting the providerID field hcloud://bm-$SERVER_NUMBER
for the baremetal inframachine objects and hcloud ccm is setting hrobot://$SERVER_NUMBER
for the workload k8s nodes. See also https://github.com/SovereignCloudStack/cluster-stacks/pull/125#issuecomment-2236589953
hcloud-cloud-controller-manager supports Robot Servers. We merged the necessary code from syself/hetzner-cloud-controller-manager at the end of last year. You can check out #523 for details on this merge, and the full design doc we have written for it. At that time, the design doc was shared with @batistein.
The design doc has the following considerations for the Provider ID. The implementation matches this plan:
We always need to know which nodes belong to which "source". We can save this info to the ProviderID field. Our existing Cloud servers use the pattern hcloud://
. For Robot, we will use hrobot:// . This differs from the Syself Fork, they use hcloud://bm- . We will also allow reading the Syself format, to enable users to migrate from the fork to our HCCM.
IMO this new format should be added to CAPH if it wants to work with hcloud-cloud-controller-manager.
I am not happy with the current situation to have two CCMs. Sooner or later we want to solve that.
I was hoping that with the merge, there was no longer a reason for the syself fork and you would migrate your users to hcloud-cloud-controller-manager.
Yes, migration is possible for the existing clusters. However, for the new clusters, the provider ID simply differs.
IMO this new format should be added to CAPH if it wants to work with hcloud-cloud-controller-manager.
I think the same for the reasons I wrote above. CAPH also recently updated docs in https://github.com/syself/cluster-api-provider-hetzner/issues/1401 for hcloud clusters. But for the baremetal servers, docs are still pointing to syself fork hccm.
I found that the mentioned manual workaround KUBE_EDITOR="sed -i 's#hcloud://bm-#hrobot://#'" kubectl edit hetznerbaremetalmachine
works for now, but CAPH csr controller is also using "wrong" ProviderID in case of usage of constant hostnames for baremetal servers. When using this feature, CAPH cannot pair nodes and kubelet-serving CSRs are therefore in a pending state(e.g. kubectl logs/exec/...
doesn't work then in the workload cluster). This needs to be also fixed.
As this is a missing feature in cluster-api-provider-hetzner
, I have opened an issue on that repository: https://github.com/syself/cluster-api-provider-hetzner/issues/1470
I will close this issue.
Please open a new issue if there are any features missing in hcloud-cloud-controller-manager that would be required for cluster-api-provider-hetzner.
I heard that someone from Hetzner successfully used it in the past, but according to my test, there needs to be at least one manual step for Cluster API's happiness. The thing is the providerID, which is different between these two projects for the robot servers and then CAPI cannot pair nodes with machine objects. For more see https://github.com/SovereignCloudStack/cluster-stacks/pull/125#issuecomment-2217241866