Throttle ALB Controller Api-Server Calls causing Api-Server Rejection

zackery-parkhurst commented 9 months ago

Hello team,

I have been inspecting the calls that the AWS Load Balancer Controller is making to the API Server and have noticed that it makes more calls to api-server than anything else in our EKS Cluster.

And in times of load the alb controller is often pushing us over the allowed limit for API-Server requests in our kubernetes cluster.

It seems that the problem call is get and update to both leases and configmaps with these request URIs:

/apis/coordination.k8s.o/v1/namespaces/alb/leases/aws-load-balancer-controller-leader
/api/v1/namespaces/alb/configmaps/aws-load-balancer-controller-leader

In a matter of 1 hour it made 11,098 calls to the api-server for just leases and configmaps.

Is there anyway to throttle these specific calls to leases / configmaps? And can you help me understand why there are so many calls. What is the calls to configmaps and leases for. The cluster in question only has 4 ingress objects, creating 4 ALBs.

I see that there is a throttle config but I am not entirely sure if this can be used to throttle the api-calls in question.

I have also included two screenshots of the actual log of the API Server for calls configmaps and leases.

Extra Information: Driver version: Helm Chart - 1.6.2, App Version - v2.6.2 Kubernetes version - 1.27 Node kubelet version - v1.27.7-eks-e71965b Node Kernel Version - 5.10.201-191.748.amzn2.x86_64 Node OS / Image - linux(amd64) / Amazon Linux 2

Thank you in advance!! Look forward to hearing from you all!

M00nF1sh commented 9 months ago

Hi, It's needs both configmap and lease since the controller code is configured in a migration mode. However, the call volume is kind of unexpected, and will investigate

oliviassss commented 9 months ago

@zackery-parkhurst, hey would you be able to share the controller logs so we can further investigate? You can send to k8s-alb-controller-triage AT amazon.com, or post here, thanks.

oliviassss commented 8 months ago

@zackery-parkhurst, just to understand the resource scope of your cluster, would you be able to share approximately how many LBs and/or TGs are there in this cluster? and also what are the top APIs it made? Thanks

m0untains commented 8 months ago

We are also seeing something similar in a number of our EKS clusters, using v2.6.2 of the controller.

Our configmap and lease update rates are similar to what was reported here: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1040

oliviassss commented 8 months ago

@m0untains would you be able to share what are the major API calls that contribute to the throttling?

m0untains commented 8 months ago

Hi,

Apologies, by "something similar", I meant "what seems like excessively frequent updates for configmaps and leases from the aws-lb-controller" (i.e. once every other second feels high, and also doesn't seem to align with leaseDurationSeconds in the configMap, which is set to 15s).

Regarding api throttling, I can see from the aws_api_requests_total{error_code="Throttling"} metric we are experiencing a bit of throttling (nothing excessive), mainly from the DescribeTags and DescribeTargetHealth operations.

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

m0untains commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 days ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kubernetes-sigs / aws-load-balancer-controller

Throttle ALB Controller Api-Server Calls causing Api-Server Rejection #3530