Closed ByteAlex closed 2 years ago
The bare metal support would be highly appreciated. A label, that causes CCM to ignore bare metal nodes, would be fine as an intermediate step. This would make CCM still functional and useful in the meantime.
Additional (already closed) issues: https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/9
There are a few problems with adding dedicated servers are real "nodes" to the k8s cluster.
We will look into how we can improve it, but I can not promise something.
Any update on this?
@ctodea you can have a look to this https://github.com/cluster-api-provider-hcloud/hcloud-cloud-controller-manager
@ctodea we managed to get a cluster working, where most nodes, including the master, are cloud servers. And some nodes are root servers, e.g. for databases. Basically the root servers should be mostly ignored by the CCM and CSI plugin. Maybe this helps:
You need to connect the root servers via vSwitch, though.
Maybe #172 results in a mainline solution ...
Many thanks for the update @malikkirchner @batistein Will give it a try, but unfortunately, I guess won't be any time soon.
@ctodea we managed to get a cluster working, where most nodes, including the master, are cloud servers. And some nodes are root servers, e.g. for databases. Basically the root servers should be mostly ignored by the CCM and CSI plugin. Maybe this helps:
Hi @malikkirchner I can see from the code that you are skipping creating routes for root servers because the API doesn't allow it (https://github.com/xelonic/hcloud-cloud-controller-manager/blob/root-server-support/hcloud/routes.go#L104). But I don't understand how pod-to-pod communication between cloud and dedicated nodes works for you. For example: 10.240.0.2 - cloud node, 10.244.0.0/24 pod network on the cloud node 10.240.1.2 - dedicated node, 10.244.1.0/24 pod network on the dedicated node
But you can't create a route 10.240.1.0/24 via 10.240.1.2 in api. Then how will the communication between the pods of the 10.240.0.0/24 and 10.240.1.0/24 network work?
Hi @identw,
that is an excellent point, I do not know and was wondering myself. According to https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/133#issuecomment-739257865 that should never have worked. We are using kubeadm to setup the cluster and Cilium as CNI plugin. I am happy to share the exact config, if you are interested.
I have two guesses how this can 'work'. Either the vSwitch does some routing, that I do not understand, or Cilium somehow manages to route to the root server. Leakage over the public device is ruled out by the root server's Hetzner firewall.
Though, it is possible, that this is a bug, that will be fixed and not work anymore, like #133. If so, I was wondering if it would make sense, to use a layer of wireguard peer-to-peer between all nodes, kinda as a unified substrate for cilium.
Any clarification on this topic is highly appreciated.
@malikkirchner
that is an excellent point, I do not know and was wondering myself
Cilium uses an overlay network between nodes (vxlan or geneve) by default, maybe you haven't disabled it? Check your cilium configmap. For example:
$ kubectl -n kube-system get cm cilium-config -o yaml | grep "tunnel"
tunnel: vxlan
This configuration will work in any way, even without hetzner cloud networks and vswich.
I was wondering if it would make sense, to use a layer of wireguard peer-to-peer between all nodes, kinda as a unified substrate for cilium
For cilium, this is not necessary, since it already knows how to build tunnels between nodes and does it by default. If encryption is required then cilium supports ipsec (https://docs.cilium.io/en/v1.9/gettingstarted/encryption/).
Also I recommend paying attention to latency when connecting vswitch to the cloud:
ping from cloud node to dedicated node via public ip:
$ ping 135.181.96.131
PING 135.181.96.131 (135.181.96.131) 56(84) bytes of data.
64 bytes from 135.181.96.131: icmp_seq=1 ttl=59 time=0.442 ms
64 bytes from 135.181.96.131: icmp_seq=2 ttl=59 time=0.372 ms
64 bytes from 135.181.96.131: icmp_seq=3 ttl=59 time=0.460 ms
64 bytes from 135.181.96.131: icmp_seq=4 ttl=59 time=0.539 ms
ping from cloud node to same dedicated node via vswitch:
$ ping 10.240.1.2
PING 10.240.1.2 (10.240.1.2) 56(84) bytes of data.
64 bytes from 10.240.1.2: icmp_seq=1 ttl=63 time=47.4 ms
64 bytes from 10.240.1.2: icmp_seq=2 ttl=63 time=47.0 ms
64 bytes from 10.240.1.2: icmp_seq=3 ttl=63 time=46.9 ms
64 bytes from 10.240.1.2: icmp_seq=4 ttl=63 time=46.9 ms
~0.5ms via public network vs ~46.5ms via private network =(.
@identw thank you for the hint, you are right, our Cilium uses vxlan as tunnel. That explains why it works. We deploy Istio on top of Cilium, I guess there is no real need for the Cilium encryption for us at the moment. As I understand enabling the Cilium encryption also conflicts with some features of Istio.
The ping from a cloud server to the dedicated server via vSwitch is not that bad for us:
# ping starfleet-janeway
PING starfleet-janeway (10.0.1.2) 56(84) bytes of data.
64 bytes from starfleet-janeway (10.0.1.2): icmp_seq=1 ttl=63 time=3.70 ms
64 bytes from starfleet-janeway (10.0.1.2): icmp_seq=2 ttl=63 time=3.57 ms
Our cloud nodes are hosted in nbg1-dc3 and the dedicated server lives in fsn1-dc15. I guess that would be even better, if we moved the cloud nodes to Falkenstein.
FYI we encountered a problem with Cilium and systemd in Debian bullseye, buster is fine: https://github.com/cilium/cilium/issues/14658.
@malikkirchner
As I understand enabling the Cilium encryption also conflicts with some features of Istio.
I mentioned encryption because you wrote about wireguard. Encryption is optional
The ping from a cloud server to the dedicated server via vSwitch is not that bad for us:
Not so bad. I tested in the hel1 location (dedicated node from hel1-dc4, cloud node from hel1-dc2).
FYI we encountered a problem with Cilium and systemd in Debian bullseye, buster is fine: cilium/cilium#14658.
Thank you interesting. I really also use cilium without kube-proxy, but I have not seen this bug.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
further action occurs
I saw that someone made a repo (https://github.com/identw/hetzner-cloud-controller-manager) to solve this, has anyone tried it?
Any updates here? @LKaemmerling are you going to implement support for root server soon?
@Donatas-L I tried it. It works great with some tidbits. It would need a bit of attention from the community to keep track with the development of the Hetzner Team @LKaemmerling you may also want to have a look here. Maybe you can take this idea ;)
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
I also am interested in using bare-metal workers via vSwitch and have it working with calico CNI. Any chance this could become mainlined in the hcloud-cloud-controller-manager?
If we want to push the european cloud we need to push awesome Hetzner to push itself to grow above itself. This way many open source cloud projects and startups with GDPR and DSGVO compliant ISMS' will able to get founded in Europe. tl;dr yea I'm interested, too.
I went ahead and rebased the work that @malikkirchner did against master
from this repo and built a new image with a few fixes that seemed to be required to use Hetzner Robot servers via vSwitch/Cloud Networks
src:
https://github.com/acjohnson/hcloud-cloud-controller-manager/tree/root-server-support
image:
https://hub.docker.com/r/acjohnson/hcloud-cloud-controller-manager
This seems to work next to perfectly with only a couple of transient messages in the cloud-controllers logs such as
I1117 01:31:27.718391 1 util.go:39] hcloud/getServerByName: server with name kube02 not found, are the name in the Hetzner Cloud and the node name identical?
E1117 01:31:27.718445 1 node_controller.go:245] Error getting node addresses for node "kube02": error fetching node by provider ID: hcloud/instances.NodeAddressesByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: , and error by node name: hcloud/instances.NodeAddresses: instance not found
...but otherwise load balancer creation works and ignores all nodes that have the instance.hetzner.cloud/is-root-server=true
label set
I'd file a PR but this really isn't my work, just a few fixes on top of what y'all have already done.
Hoping something more legit will make its way into this repo but for now this will have to do.
@LKaemmerling would you consider reopening this issue as there is a fair bit of support for this feature and quite a bit of hacking that has gone into it already
@acjohnson thank you for improving on Boris' change.
Uhm, why is this closed? Currently it does not work. What can I do please? Any step by step instructions how I can provision a LB connected to my 3 root servers?
It's already full integrated with: https://github.com/syself/cluster-api-provider-hetzner
Ah, yes. I've read about that CAPI a few days ago already. Thanks mate!
I'm getting Cloud provider could not be initialized: unknown cloud provider "hetzner"
from the logs.
Any Idea how to fix this?
sounds like you have the wrong provider argument in the deployment... Did you only replaced the image? see: https://github.com/syself/hetzner-cloud-controller-manager/blob/master/deploy/ccm.yaml#L63
Well, after removing the "old" ccm, I installed the suggested one with:
kubectl apply -f https://github.com/syself/hetzner-cloud-controller-manager/releases/latest/download/ccm.yaml
Which contains:
containers:
- image: quay.io/syself/hetzner-cloud-controller-manager:v1.13.0-0.0.1
name: hcloud-cloud-controller-manager
command:
- "/bin/hetzner-cloud-controller-manager"
- "--cloud-provider=hetzner"
- "--leader-elect=false"
- "--allow-untagged-cloud"
Any slack/discord channels available? Don't want to spam this issue here further.
kubernetes slack workspace channel #hetzner
Hello,
is it possible to add servers from the Hetzner Robot to the cluster created with the CCM?
I've been using a K3s cluster which I bootstrapped manually and when I tried to install the hcloud CCM the hcloud:// provider was not working for all servers - whether they were Cloud or Robot servers.
Now I've bootstrapped a cluster using kubeadm and followed the instructions and the hcloud:// provider seems to be working, yet I still have my bare-metal servers and before I let them join my cluster and possibly destroy the CCM, I'd rather ask for clarification first.
My expectations would be:
Thank you!