[occm] not able to use loadbalancers with big cluster

zetaab commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?: BUG /kind bug

What happened: I have cluster which has 160 nodes and I am trying to use loadbalancer. However, its not working

I1103 12:46:27.868601       1 openstack_loadbalancer.go:1378] Creating loadbalancer kube_service_kubernetes_echoserver_echoservernew
I1103 12:47:30.943210       1 openstack_loadbalancer.go:1142] Creating listener for port 80 using protocol TCP
I1103 12:47:35.251379       1 openstack_loadbalancer.go:1064] Creating pool for listener 0e280654-70d9-43f9-8594-df62835b8c33 using protocol TCP
I1103 12:47:38.106640       1 openstack_loadbalancer.go:1077] Pool 0debc153-1d8d-4a14-978d-73302af1f14d created for listener 0e280654-70d9-43f9-8594-df62835b8c33
I1103 12:47:38.341320       1 openstack_loadbalancer.go:1089] Updating 160 members for pool 0debc153-1d8d-4a14-978d-73302af1f14d
E1103 12:48:38.343624       1 controller.go:275] error processing service echoserver/echoservernew (will retry): failed to ensure load balancer: Put "https://foo:13876/v2.0/lbaas/pools/0debc153-1d8d-4a14-978d-73302af1f14d/members": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I1103 12:48:38.343698       1 event.go:291] "Event occurred" object="echoserver/echoservernew" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: Put \"https://foo:13876/v2.0/lbaas/pools/0debc153-1d8d-4a14-978d-73302af1f14d/members\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

Does anyone has idea how big clusters can be even used in openstack? Is this tested regurarly?

What you expected to happen: I expect that I could use loadbalancers also in big clusters.

How to reproduce it: Create big cluster and try to use loadbalancers.

lingxiankong commented 3 years ago

Does anyone has idea how big clusters can be even used in openstack? Is this tested regularly?

We don't have such information, unfortunately, 160 nodes is really a big cluster. Have you checked the octavia service log? What happened after adding 160 members?

jichenjc commented 3 years ago

looks like the error happened with 1min time out maybe 160 too big to be complete in 1 min ? maybe check split those to avoid http request timeout?

zerodayz commented 3 years ago

It looks like timeout, by default 1m.

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L317

Either split or increase to at least 5 minutes by passing --request-timeout=300

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/server_run_options.go#L184

kayrus commented 3 years ago

@zetaab do you use a legacy neutron lbaasv2 extension?

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/1304#issuecomment-860673243): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-openstack

[occm] not able to use loadbalancers with big cluster #1304