kube-vip support (load balancing + VIP)?

ThomasLohmann commented 3 years ago

Hi everyone,

I just tried to install kube-vip (load balancing + VIP) on k0s control planes, but it wasn't possible since the control planes are isolated. I also tried to install kube-vip on k0s worker nodes, but it didn't work as well. Is there a way to install kube-vip on k0s control planes? I think it should be possible somehow, since k0s also supports MetalLB for load balancing (MetalLB is kind of similar to kube-vip): https://docs.k0sproject.io/v1.22.2+k0s.0/examples/metallb-loadbalancer/

Here is the link to kube-vip: https://github.com/kube-vip/kube-vip

Just as a side note: I detected 3 possible solutions for load balancing on bare-metal so far:

Envoy/Nginx/HAproxy + Keepalived (outside the cluster)
kube-vip (inside the cluster)
MetalLB/OpenELB/PureLB (inside the cluster)

Currently, my favorite is kube-vip, because of its simplicity.

It would be great, if k0s can provide a small guideline in the docs, which load balancing solution is the best for k0s. It will simplify the decision process for k0s users (bare-metal) by a lot.

Regards, Thomas

bephinix commented 3 years ago

@ThomasLohmann I am also interested into using KubeVIP for HA Control Planes. Metallb works for LoadBalancer Services as it is running its speaker-pods on the worker nodes itself which is possible with k0s. Deploying KubeVIP on k0s Control Planes (for HA CP) requires either a kubelet daemon on the control planes, a direct integration of the project into k0s-controller or running the kube-vip container directly on the container runtime (if available on the controllers).

I am currently trying to get Kube-VIP running as a direct container deployment as containerd is available on my controllers. As soon as I have additional information, I will post my updates.

ThomasLohmann commented 3 years ago

@bephinix great, I'm looking forward to your feedback!

Regards, Thomas

jnummelin commented 3 years ago

I'm not too familiar with kube-vip but I'm not sure if all the pieces play along nicely on this unfortunately. (I hope I'm wrong of course 😄 )

So one of the drivers to require LB of sorts for controllers is the fact we use konnectivity as "reverse tunnels" to make api communicate with kubelets (and other stuff running on workers). The konnectivity-agent requires couple things for HA setups:

single address to connect to
round robin balancing so it can actually connect to ALL controllers (konnectivity-server process on controllers)

If kube-vip provides a VIP for the controlplane, I don't think the agent can actually connect with ALL konnectivity-servers. 😞 The VIP would get routed to single controller always, right? That would mean the other controllers cannot communicate properly with kubelets (e.g. logs for pods) and other bits on the workers (e.g. metrics service).

bephinix commented 3 years ago

@jnummelin But if the konnectivity-agent always targets the VIP (which is ARP-spoofed by the control plane which holds the lock), this might not be a problem. 🤔

ThomasLohmann commented 3 years ago

@jnummelin @bephinix any updates if it works?

bephinix commented 3 years ago

@ThomasLohmann Unfortunately not, but it is still on my list.

0megam commented 3 years ago

I'm trying to implement something simmilar with keepalived + haproxy. I want to deploy keepalived + haproxy on each control plane node. Haproxy should round robin any request it receives between all 3 control plane nodes. But the problem is that all control plane services (6443 Kubernetes API, 8132 for Konnectivity, 9443 for controller join API) is listening on all interfaces and all ips on a node, so I can't bind haproxy to a VIP with same ports. I saw PR https://github.com/k0sproject/k0s/pull/1038, but I think it's not enought to make this work and k0s should allow to specify bind ip address for all components on control node. Am I wrong ?

ThomasLohmann commented 3 years ago

@omegarus we did it in another way: We deployed envoy + keepalived just on separate nodes (small VMs) instead of on the master nodes. It should solve your problem.

0megam commented 3 years ago

@ThomasLohmann We thought about this, but in "bare metal" world we do not want to run any virtualisation in our DCs. We want to use minimum of necessary hardware for k8s cluster to satisfy our workload needs. Using external servers as load balancers introduce complexity and management overhead.

jnummelin commented 2 years ago

The issue is marked as stale since no activity has been recorded in 30 days

xinity commented 2 years ago

Not sure we can reopen this issue but I think I found a hacky way to use kube-vip to LB the API-server

ThomasLohmann commented 2 years ago

@xinity nice to hear! Please, can you describe your solution? It would help a lot.

@jnummelin I suggest to reopen this issue, since it's still not resolved and load balancing is a really important feature. Probably you have any new ideas in mind how to resolve this issue?

Best regards, Thomas

farconada commented 2 years ago

Please, consider reopen this issue. It's a very important feature

farconada commented 2 years ago

For example in our case It's a breaking point. If we could use this approach https://github.com/k0sproject/k0s/issues/1150#issuecomment-954309569 k0s will be our solution, if not, we have to discard it. We cant affort more machines just for load balancing @jnummelin @twz123 I think that it's a quite common case for example in IoT clusters

jnummelin commented 2 years ago

So if I understood correctly, the issue is that there's no config options for all of the components to bind to only some address on the host for the controlplane components? If so, which components are missing it?

We've also got one feature for this in the "pipeline", a client-side load balancer for all the worker nodes which would mitigate the need for an external LB. There's still quite a few quirks to sort out on that so not really sure what's the timeline for us to be able to ship that. Essentially k0s manages a slim LB on each of the workers and all the worker components (kubelet, kube-proxy,konnectivity etc.) are configured to talk to the API via that.

farconada commented 2 years ago

IMHO are the ones which conflicts with HAProxy (KubeAPI, konnectivity, JoinAPI)

frontend kubeAPI
    bind lb-ip:6443
    default_backend kubeAPI_backend
frontend konnectivity
    bind lb-ip:8132
    default_backend konnectivity_backend
frontend controllerJoinAPI
    bind lb-ip:9443
    default_backend controllerJoinAPI_backend

jnummelin commented 2 years ago

6443: k8s api port, can be configured with spec.api.port 9443: k0s controller join discovery port, can be configured with spec.api.k0sApiPort 8132: konnectivity server, can be configured with spec.konnectivity.agentPort

So if we'd bind all these three to non-default ports and then make the HAProxy to listen to the default ports, would that work?

farconada commented 2 years ago

I think that it wont work. How can I tell konnectivity, and other that they have to connect to the balancer ports instead of binded ports. In a kubeadm installation you have an option --control-plane-endpoint "lb:8443" that is different than "controlnode1:6443" If ports in load balancer and control nodes have to be the same then you have to specify the address If the ports could be different you have to inform to your nodes the ports in the load balancer (is it possible on k0s config?, IMHO no at this momment)

jnummelin commented 2 years ago

@farconada you're right, I absolutely forgot that we propagate the custom port to the "clients" too automatically. Had to read our HA custom port test case for that. 😄

So basically the only good option I see (apart from the client-side balancer thingy we're already working on) is to make the bind address configurable so we can bind all controlplane components to some non VIP address on the host. I'm thinking of something like:

spec:
  bindAddress: 1.2.3.4

k0s would then use this for all components that are listening to traffic on the host.

WDYT?

farconada commented 2 years ago

Not perfect, but it's an easy solution. let me explain: when using keepalived + haproxy the nodes are configured with net.ipv4.ip_nonlocal_bind = 1 so haproxy is always running (int he 3 master nodes) and all the magic it's done with keepalived moving the vIP. Very easy too configure and stable. With your "bindAddress" alternative, haproxy could be running in just one node, the node that has the vIP. So the keepalived has to stop/start haproxy regarding where the vIP is placed (It could be done with notifications). Or may be better, instead of keepalived could be replacing keepalived with pacemaker so IMHO you could apply the bindAddress as an easy/fast solution, it's a customization that lot of people could be thankful of it and it allows to solve the problem in a first step. with regards of your client-side balancer. IMHO k0s needs an alternative that could achieve the same funcionalitity like with kubeadm --control-plane-endpoint so you could use keepalived+haproxy in a well known an reliable architecture trusted by all of the industry that dont supposes a waste of resources. For example a multimaster of 3 nodes where 2 of them are allowed to schedule pods.

With all my respect of your client-side balancer it's easier to trust in a "standard" architecture like keepalived+haproxy.

I really think that k0s simplifies a lot the deployment of kubernetes in on-premises solutions. It's a great piece of software that I look forward to seeing advance and grow. For me the biggest weakness it has in production deployments is precisely this balancing issue.

jnummelin commented 2 years ago

With your "bindAddress" alternative, haproxy could be running in just one node

I don't get why that would be true? So say we have three controllers. You configure all of these to use their primary address as spec.bindAddress. (all three nodes woul naturally configre different bind-address) On top of that you'd configure spec.api.externalAddress: 1.2.3.4 where 1.2.3.4 is your VIP. This will result into k0s configuring all "clients" (kubelet, kube-proxy, konnectivity-agent etc.) to connect to 1.2.3.4. What's stopping you on running keepalived + HAProxy on all of the controllers? I assume HAProxy is configured to bind only to the VIP address, right?

With all my respect of your client-side balancer it's easier to trust in a "standard" architecture like keepalived+haproxy.

Sure, that's valid point. I think the main goal for this is to reduce friction setting up the clusters by providing a way to do things out-of-box, not to replace any other well-known solutions like keepalived + HAProxy for example. So these things should not be mutually exclusive. And in cloud environments people usually rely on external LBs like ELBs etc. anyways.

farconada commented 2 years ago

external address 1.2.3.4 only exists in one node: the primary, not in the backup nodes so you cant bind to 1.2.3.4 in the backup nodes.

100% agree that your client-side balancer solution doesnt have to compete the keepalived+haproxy solution

jnummelin commented 2 years ago

external address 1.2.3.4 only exists in one node: the primary, not in the backup nodes so you cant bind to 1.2.3.4 in the backup nodes.

yes, but let's assume you got 3 controller nodes: 1.1.1.1, 2.2.2.2 and 3.3.3.3 and 100.100.100.100 is the VIP you configure for keepalived.

So in 1.1.1.1 you'd configure k0s with following:

spec:
  bindAddress: 1.1.1.1
  api:
    externalAddress: 100.100.100.100

So in 2.2.2.2 you'd configure k0s with following:

spec:
  bindAddress: 2.2.2.2
  api:
    externalAddress: 100.100.100.100

So in 3.3.3.3 you'd configure k0s with following:

spec:
  bindAddress: 3.3.3.3
  api:
    externalAddress: 100.100.100.100

So each controller bind to it's own address of course, but all configure the VIP as the externalAddress which in turn makes all the client bits configured to use that as API address. Makes sense?

The extenalAddress does not need to be a real address on any of the nodes. E.g. if I'd setup the HA cluster in AWS, I could use here the ELB address.

sudas-px commented 1 year ago

Guys, we are exploring stable ways to manage production kubernetes in our bare metal server lab. I am in love with this project but currently the inability to provision HA control plane via Kube-VIP or any client side tool is stopping us from moving forward.

Just wanted to check if there is a manual way to do this right now and if there are any on going efforts to support this in future.

Thanks in advance

jnummelin commented 11 months ago

Like mentioned before, VIPs in general do not fit the HA connectivity model for k0s control plane <--> kubelet tunneling. Konnectivity agents (on nodes) need to be able to connect to ALL controllers. VIPs in general only direct the traffic to one target at a time and thus do not do any round-robin like targeting. Thus k-agents would not be able to connect to all controllers.

To mitigate this shortcoming, k0s has a bundled in feature called NLLB, https://docs.k0sproject.io/stable/nllb/

NLLB works on the workers nodes and enables seamless failover for all the worker components when they connect to the controllers. This means one can use k0s NLLB for cluster internal (kubelet, kube-proxy, etc. in-cluster components) HA and VIPs (or any other LB like technology) to achieve HA for external clients such as kubectl and Lens for example.

onedr0p commented 10 months ago

Hi!

I would like to use multiple controllers and install Cilium with eBPF, this requires setting k8sServiceHost. In other distros I set this to the kube-vip IP addr but it's not clear with k0s what this should be set to. I do understand this should not be set to a single controller IP addr.

Do I still need a external lb in this use-case?
Does k0s supply a proxy to the API server on every host like k3s?

If anyone has any insight into this, that would be great.

Edit: I was able to enable the NLLB and set these helm values for Cilium:

k8sServiceHost: localhost
k8sServicePort: 7443

I am not sure if this is advised but it worked in my small tests.

juanluisvaladas commented 10 months ago

Hi @onedr0p, NLLB will get the job done, it's an experimental feature in 1.28 (it won't be experimental in 1.29, which will be released soon).

Other than NLLB you can use kube-vip or an external LB. I personally would use an external load balancer for large clusters and for small ones probably stick with NLLB. Kube-vip would work to, but I don't have experience with it so I can't recommend it...

onedr0p commented 10 months ago

kube-vip has served me very well with k3s over the last few years. Like you said I bet it can still work, maybe as a static pod if k0s supports that?

juanluisvaladas commented 10 months ago

K0s does not enable static pods by default, you can either enable it explicitly using the flag --pod-manifest-path [1] [2]

You can also consider using the manifest deployer which is enabled by default. The way it works isn't exactly the same, so I'd say it makes more sense to use it to create deployments or statefulsets than pods.

onedr0p commented 10 months ago

@juanluisvaladas thanks for the info. I was able to get kube-vip working on my controller+worker and worker based clusters as a daemonset using the manifest deployer 🎉 It also worked as a static pod too but I figured the daemonset method was just as good.

juanluisvaladas commented 10 months ago

Hi everyone,

The k0s team discussed this issue internally yesterday and we think nowadays think this use case is covered by our current implementation: 1- For internal load balancing, we already support NLLB 2- For external load balancing, we already support MetalLB 3- You can create a service of type: LoadBalancer pointing to the kube apiservers using MetalLB. To create this service you can either:

Create a service without selector pointing manually to the api-servers. This requires the controllers not to change IP addresses.
Create a service with selector pointing to pods with static IPs running on the controllers. This requires the controllers to also be workers.

We believe this case is covered and there is no need to implement kube-vip for this and therefore we're closing the issue. Nonetheless, kube-vip should work in k0s. If a user discovered that kube-vip can't work on k0s for whatever reason we'll look into the issue.

If we missed something and the solution proposed in this comment does not fulfill your use case please file a new issue explaining why it's not covered. We are not saying no to kube-vip, if there is a use case that isn't covered by what we already support we'll reevaluate it.

k0sproject / k0s

kube-vip support (load balancing + VIP)? #1150