Open derfabianpeter opened 7 months ago
Hi @derfabianpeter, have you found a workaround in the meantime? I am having the same problem right now (using Tailscale for the private network).
Hi @vitobotta - not yet, sorry
Hi @vitobotta - not yet, sorry
Thanks for letting me know :) If you do find a solution please update this thread. I will do the same :)
Hey you two, sorry for the late response.
HCCM is able to deduct the machine ID, etc from a node. If not from InternalIP then at least from ExternalIP which we correctly set to the public IP of a node.
HCCM actually uses the name of the Kubernetes Node
object to find a Server
in the Hetzner Cloud API that has the same name. This is not well explained for the "Cloud" part of HCCM, but we added a section on this for Robot.
While HCCM tries to "initialize" the node (and remove the uninitialized
taint) it also compares the Node Addresses it gets from the Hetzner Cloud API to the IPs that are already specified on the Node
object, and fails the initialization if there are conflicts. This is the error you are seeing.
Running supported nodes (Hetzner Cloud & Robot) together with "unsupported" nodes goes against the design of the kubernetes/cloud-provider
library we use to interact with Kubernetes. I have spent some more time explaining this in https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/530#issuecomment-2060425409 if you are interested. To properly support mixed/hybrid clusters there need to be large changes to k/cloud-provider
to allow for this.
The Node IP is another one of these bits, where k/cloud-provider
assumes that it knows everything it should know and the cloud-provider can dictate the IP addresses, afterall, on AWS you probably use the VPC & ENI to connect your nodes and dont need any external tools for this.
That said, maybe we can figure out how to make this usable for you. What parts of the functionality of hcloud-cloud-controller-manager do you want to use? There is a list of the general features in the README.
Just from the log @derfabianpeter posted, it might be the Load Balancer part.
The Load Balancer (service) controller depends on the Node controller (which currently fails) to set the Node.spec.providerID
field.
Instead of using the Node controller to fill this field, you can also set it yourself when you start the kubelet by setting the flag --provider-id=hcloud://$SERVER_ID
. You can even get the Server ID from the Metadata Service or through cloud-init directly on the server:
cloud-init query instance-id
curl http://169.254.169.254/hetzner/v1/metadata/instance-id
This would make it possible to disable the Node
controller in your installs, but you will also miss out on the labels it sets. The Node
controller is also responsible for removing the uninitialized
taint, so you would need to remove the --cloud-provider=external
flag from the Kubelet, which adds this taint.
If you depend on any other feature, I am happy to discuss your requirements.
Hope this gave you an initial insight into the way things are and what the options are to work around this.
Hi @apricote , thanks for the update :) In my case I only use cloud instance (no dedicated servers etc), and I am mostly interested in the ability to provision load balancer. The configurations I am testing currently are with larger clusters of over 100 nodes, so I cannot use the Hetzner private networks. Instead I am using Cilium with wireguard encryption to use the public internet for the communication between the nodes. So yeah, since the Hetzner networks are excluded from this kind of configuration, all I need the CCM for is basically just the load balancers provisioning. Did In understand it correcrtly that I can just set the node provider id directly without even installing the CCM? Or do I still need to install it? Thanks!
Instead I am using Cilium with wireguard encryption to use the public internet for the communication between the nodes
I would like to check out how this interacts with the Node Addresses & HCCM. Is this your current Cilium configuration? If not, could you paste the values you use? https://github.com/vitobotta/hetzner-k3s/blob/d824c126f45071f72ff2686b59fd8ccc5825c5a2/src/kubernetes/software/cilium.cr
Did In understand it correcrtly that I can just set the node provider id directly without even installing the CCM? Or do I still need to install it?
You can set the node provider id directly, but the Load Balancers are still created and managed by hcloud-cloud-controller-manager.
If I understand k/cloud-provider
correctly, you can disable the Node controller by passing --controllers=-cloud-node-controller
(note the -
infront of the controller name) to HCCM. But I have never tested this configuration and we do not officially support it.
@apricote thanks for dealing with this so quickly and for the detailed explanations.
With regards to the features I'm interested in: only the provisioning of Loadbalancers backed by Cloud Nodes. I wanted to hook up a few bare metal machines that we operate in a dedicated datacenter with the cluster to make their compute available to the services we want to run in that cluster. But I found a way to make that happen without mixing Hetzner Cloud and external nodes while having the cluster backed by HCCM.
Thanks again for your thoughtful explanations. With that said, I guess this is more a Feature Request and not a bug.
Hi @apricote,
Instead I am using Cilium with wireguard encryption to use the public internet for the communication between the nodes
I would like to check out how this interacts with the Node Addresses & HCCM. Is this your current Cilium configuration? If not, could you paste the values you use? https://github.com/vitobotta/hetzner-k3s/blob/d824c126f45071f72ff2686b59fd8ccc5825c5a2/src/kubernetes/software/cilium.cr
Yep, that one. The chart version is currently v1.15.4 and the encryption is enabled. No other settings apart from what you see in that code :)
Did In understand it correcrtly that I can just set the node provider id directly without even installing the CCM? Or do I still need to install it?
You can set the node provider id directly, but the Load Balancers are still created and managed by hcloud-cloud-controller-manager.
If I understand
k/cloud-provider
correctly, you can disable the Node controller by passing--controllers=-cloud-node-controller
(note the-
infront of the controller name) to HCCM. But I have never tested this configuration and we do not officially support it.
I see, thanks
I had the same problem using Hetzner Cloud VMs and Hetzner Robot servers connected via Wireguard. Since I'm using Consul service discovery I can use DNS to lookup the node VPN IPs. This should also work fine e.g. with Tailscale Magic DNS.
See this commit for my solution. Maybe this approach makes sense for others as well and could be converted into a more generic solution, e.g. some kind of flags --use-dns-for-internal-ip
and --internal-ip-dns-suffix
.
Does this mean the HCCM can not work for nodes that only have a private ipv4 and no public ipv4? I have all my nodes on a private network behind a NAT gateway and they do not have public addresses.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
TL;DR
We're building a hybrid cluster from Cloud Servers and external Bare-Metal machines. For this to work properly we're using a Wireguard Network between all nodes and use the Wireguard IPs as the nodes'
InternalIP
and for connecting between them. We're not using HCLOUD internal Networks at all.Expected behavior
HCCM is able to deduct the machine ID, etc from a node. If not from
InternalIP
then at least fromExternalIP
which we correctly set to the public IP of a node.Observed behavior
HCCM fails to get the machine ID from the HCLOUD API since it only uses
InternalIP
as source for identification which in our case is a Wireguard IP from the172.16.187.0/24
range. This results in nodes not properly being initialized and Loadbalancers not being able to be provisioned due to missing backend node infos.Minimal working example
This is the Deployment.yml we use to install HCCM into our k3s Cluster with Cilium CNI:
Log output
Additional information