kubernetes / cloud-provider-openstack

Apache License 2.0
617 stars 608 forks source link

[occm] Allow restricting backend pools with a node label selector #1770

Closed squ94wk closed 5 months ago

squ94wk commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

Feature Allow restricting backend pools with a node label selector.

I propose adding an annotation such as loadbalancer.openstack.org/node-selector that takes a label selector string labelname=value and when set, adds only nodes to the backend of the LB that match the label selector.

Anything else we need to know?: Motivation: Our LBaaS restricts the member limit of an Octavia LB to 300 members. This is because the Octavia status-update packets (UDP) are otherwise fragmented and dropped, causing a failover of the amphora. A service with 3 ports in a cluster with >100 nodes would cause the LB to break. We happen to have such a scenario. We could label the nodes and thus circumvent the issue like this.

Additionally, this could potentially be interesting for customers using externalTrafficPolicy: Local targeting a subset of their nodes.

I found [this MR] that implemented a similar thing for AWS ELB https://github.com/kubernetes/kubernetes/pull/90943/files

Environment: n.a.

lingxiankong commented 2 years ago

Additionally, this could potentially be interesting for customers using externalTrafficPolicy: Local targeting a subset of their nodes.

This is what I am going to recommend, seems externalTrafficPolicy: Local could solve your issue perfectly.

Adding node label selectors doesn't sound like a good solution for this, as it may be confusing for the end user which nodes he should choose.

zetaab commented 2 years ago

externalTrafficPolicy: Local does not solve the issue. The problem with Local policy is that all nodes is added still as lbaas members, but those will just be as non healthy (amphora will send the healthcheck packets to all nodes 24/7).

We have seen similar issues with octavia after over 100 nodes in single cluster.

I think we should have somekind of way to limit loadbalancer to subset of nodes. OpenStack APIs cannot handle situations where we have lots of loadbalancers and each of them has hundreds of members. This is special case for large scale users, normal users should not use annotation if they do not know what it does.

squ94wk commented 2 years ago

Just to clarify, the externalTrafficPolicy thing is not the main focus for us. We just need to limit the number of members in the pool. How this happens doesn't actually matter to us.

lingxiankong commented 2 years ago

externalTrafficPolicy: Local does not solve the issue. The problem with Local policy is that all nodes is added still as lbaas members, but those will just be as non healthy (amphora will send the healthcheck packets to all nodes 24/7).

That's a good point @zetaab

lingxiankong commented 2 years ago

Just to clarify, the externalTrafficPolicy thing is not the main focus for us. We just need to limit the number of members in the pool. How this happens doesn't actually matter to us.

Yeah, I just realized using externalTrafficPolicy: Local doesn't solve the issue, as @zetaab said.

We need an approach that is friendly and easy to use for the users. If we want to limit the number of backend members in the pool, we need to expose some information to the end users, something could split nodes into different groups. Not sure if the node selector is the way to go, but let's get more feedback here.

jacksgt commented 2 years ago

When externalTrafficPolicy: Local is set, why are all nodes in the cluster (still) added as pool members? I would expect that only "active" members (i.e. nodes running pods for the service) are added to the pool.

lingxiankong commented 2 years ago

I would expect that only "active" members (i.e. nodes running pods for the service) are added to the pool.

No, all nodes are added as members but the health check would rule "bad" members out.

jacksgt commented 2 years ago

No, all nodes are added as members but the health check would rule "bad" members out

The question is why though.

On 3 April 2022 12:39:26 CEST, Lingxian Kong @.***> wrote:

I would expect that only "active" members (i.e. nodes running pods for the service) are added to the pool.

No, all nodes are added as members but the health check would rule "bad" members out.

-- Reply to this email directly or view it on GitHub: https://github.com/kubernetes/cloud-provider-openstack/issues/1770#issuecomment-1086833490 You are receiving this because you commented.

Message ID: @.***>

p7kdev commented 2 years ago

I would expect that only "active" members (i.e. nodes running pods for the service) are added to the pool.

No, all nodes are added as members but the health check would rule "bad" members out.

That might be something to rethink. Because if all nodes are always added (including the master nodes), this always leads to the created load balancer being in a bad state (DEGRADED), which makes the corresponding monitoring more difficult. Having nodes in a load balancer as members that really never become active is somehow not a really nice solution. If you could at least tweak that with annotions/labels, that would be a big win.

lingxiankong commented 2 years ago

That might be something to rethink. Because if all nodes are always added (including the master nodes), this always leads to the created load balancer being in a bad state (DEGRADED), which makes the corresponding monitoring more difficult. Having nodes in a load balancer as members that really never become active is somehow not a really nice solution. If you could at least tweak that with annotions/labels, that would be a big win.

You are correct, adding all the nodes as members is not the best solution, especially for the cloud provider which doesn't support the container native networking feature. However, this is the default implementation in Kubernetes, Kubernetes provides the fundamental mechanism to report the health status of the Node and route the traffic to the local Pod.

Using annotations/labels could decrease the number of members, but that also means we need to restrict the pod being deployed to those nodes, and expose those annotations/labels to the end users properly.

I'd like to see a design proposal that could solve the above concerns.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jacksgt commented 2 years ago

/lifecycle frozen

philimonoff commented 1 year ago

I'd like to +1 to the relevance of this issue. Almost every openstack-based cloud provider I worked with, limits size of pool to 100. The only way to get bigger cluster - not to add all nodes as members of pool somehow. And there wasn't still suggested better solution than node-selector.

zetaab commented 1 year ago

I am thinking could we use nodeselector or externalTrafficPolicy: Local and add ONLY nodes (not all like currently and lb goes to DEGRADED state) that do have pods. If pods will move, then loadbalancer members should be updated. No idea is this possible easily or not. Anyway this is issue that should be solved somehow, currently its almost impossible to have larger than 100 node clusters in openstack (if using octavia).

So basically when using externalTrafficPolicy: Local we should somehow try to listen changes in endpoints, check endpoint nodeName, find ip (or openstack instance id?) for node and act (also we need nodeport from service)

zetaab commented 1 year ago

for the reference:

at least AWS do have https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/pkg/annotations/constants.go#L83 node selector. That is all, so perhaps we should start with node selector

m-kratochvil commented 5 months ago

Joining the crowd that would find this extremely helpful. While we don't have any member limits in the cloud (Openstack/Octavia), we suffer from the issue of load balancers distributing traffic to newly added pool members (after node is added to k8s cluster and externalTrafficPolicy: Local is set). Having only the "correct" nodes added as pool members to Octavia would completely solve our issue..

databus23 commented 5 months ago

No, all nodes are added as members but the health check would rule "bad" members out The question is why though.

Giving my two cents: I always understood it that kubernetes in general tries to avoid frequent changes to the cloud infrastructure. There might be rate limits in how many changes you can do or the cloud infrastructure simple might be to slow applying the update. Especially in Kubernetes endpoints becoming available/unavailable might happen fairly often (like flapping readiness checks).

For externalTrafficPolicy: Local its critical that the load balancer reacts fairly quickly and reliable to changes when endpoints change. For example if you do rolling update of a service with 20 endpoints and the new pods all land on new nodes the cloud infrastructure API might not fast enough to implement the 20 member changes causing on outage.

The current approach of adding all nodes as member provides a more static configuration of the loadbalancer only needing to do updates when node availability changes.

I would be very careful in changing this behavior of externalTrafficPolicy: Local in that regard.

Update: Or at least changing the behavior should be an optional feature.

kleini commented 5 months ago

For us, the problem is, that we highly scale nodes up and down as the Kubernetes cluster is used to drive CI/CD load. Today, the cloud API is already too slow, to correctly add and remove nodes from the load balancer. Furthermore, we have dedicated nodes for ingress utilizing Istio. It does not make any sense to route incoming traffic from the load balancer to one of the dynamic nodes, which needs to route then the traffic to one of the nodes running Istio. Therefore, it would be very nice to be able to have only dedicated nodes - maybe especially tagged nodes - provisioned as members in the load balancer.

kayrus commented 5 months ago

I submitted the PR