kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
940 stars 774 forks source link

Tune batcher EC2 Describe* delays #2029

Closed AndrewSirenko closed 2 months ago

AndrewSirenko commented 2 months ago

Is this a bug fix or adding new feature? Improvement

What is this PR about? / Why do we need it? Given that no throttling is observed for DescribeVolumes with 7k volume scalability tests, we reduced the current 1 second max batch delay to 500ms, and set that as the standard batcher delay.

This PR makes sure each RPC takes ~0.25s of batch latency per EC2 Describe call on average (and worst case extra delay of 0.5s).

This PR also makes sure each batcher will execute twice per second. In the rare case that each batcher is executing at once, the combined 12 requests per second is under the default EC2 Non-Mutating Action Bucket Refill Rate of 20.

What testing is done? Scalability tests with batcher delays of 1, 0.5, 0.3, and 0.2 seconds for DescribeVolumes and DescribeInstances.

Final 5000 pod scalability test on default limits account.

API Count OK Client Error Throttled
CreateVolume 5676 5030 0 646
DeleteVolume 5563 4999 2 562
AttachVolume 5646 4989 0 657
DetachVolume 5747 4996 101 650
DescribeVolumes 4482 4482 0 0
DescribeInstances 3760 3760 0 0
github-actions[bot] commented 2 months ago

Code Coverage Diff

This PR does not change the code coverage

ConnorJC3 commented 2 months ago

/approve

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorJC3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/OWNERS)~~ [ConnorJC3] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment