kubernetes-sigs / cluster-api-provider-vsphere

Apache License 2.0
372 stars 295 forks source link

Internal Load Balancer for ControlPlane HA (other types) #819

Closed moonek closed 3 years ago

moonek commented 4 years ago

/kind feature

Describe the solution you'd like I fount it difficult to controlplane HA because the built-in load balancer is not in vsphere. And I also found some attempts to solve this. https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/705 (a separate LB) https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/722 (nsx-t LB)

However, a separate LB do not guarantee HA for LB(haproxy vm failure) and need additional resource for this. And nsx-t LB can only be configured in certain supported environments.

I want a load balancer with no SPOF due to LB failure, no additional resources and no environmental dependencies.

Anything else you would like to add: This is for reference and it's actually the way I'm using it on-prem. Run haproxy and keepalived on the controlplane to make internal LB. It only requires to allocate vip that can be routed on all cluster nodes. The concept architecture is shown below and has been tested for node failure.

architecture-1 architecture-2

In my on-prem, I wrote static ip manually in haproxy.cfg and keepalived.conf and then deployed the haproxy+keepalived container with this file mounted as docker. (with --restart=always) The haproxy+keepalived container internally monitors the process for each module.

I install the control planes after installing the architecture described above.

I don't know much about capv, but I think it's possible to automate it using capv.

ncdc commented 4 years ago

965 shows an example for how to use kube-vip with CAPV to get a VIP for all the control plane machines. It removes the HAProxy VM. All traffic for the control plane goes to the VIP, which is bound to a single VM at a time. There is no load balancing in this approach - all requests go to whatever VM has ownership of the VIP. Would that be sufficient for you @moonek?

moonek commented 4 years ago

I am very happy with the removal of LB VM resource and SPOF. The first request is sent to the VM with VIP ownership, but since then it isn't load balancing to each apiserver?

ncdc commented 4 years ago

@moonek no, it won't be load balancing because the control plane endpoint -- the URL that clients use to talk to the apiserver -- is the VIP, and the VIP is just another IP that is assigned to one and only one VM at a time, and there is nothing acting as a load balancer. If you want a load balancer, for now you would need to come up with a recipe for deploying one (and make it HA, presumably), and then you'd manually set cluster.spec.controlPlaneEndpoint to the load balancer's IP/DNS name.

dhawal55 commented 4 years ago

Once traffic flow to the VIP, can kube-vip redirect it to kubernetes service IP? If so, then kube-proxy will spread it to all apiserver instances.

ncdc commented 4 years ago

@dhawal55 I imagine you could configure iptables to do that. I don't think kube-vip does that by default. You could file an issue there if you wanted?

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/819#issuecomment-750919477): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.