Closed ZihanJiang96 closed 8 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
a70d012
) 69.78% compared to head (e82d0b9
) 69.78%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think we can keep Burst
slightly higher. Maybe twice of the QPS
? (a 100 and 200 in this case?)
https://github.com/kubernetes/client-go/blob/5a0a4247921dd9e72d158aaa6c1ee124aba1da80/util/flowcontrol/throttle.go#L61C34-L61C34
Looks like Burst
is just the initial allocation of tokens to query API server. Once the Burst
is exhausted, the querying will be limited by the QPS
Issue
When we terminate a large mount of nodes at the same time, let's say 600 nodes, lifecycle-manager can only process 75 node events per minute, which means
600/75=8
min. If we set the ASG Lifecycle hook's heartbeat timeout seconds to 300s, then some of the node events will never get processed and after the 300s timeout, the node will get terminated by ASG directly without proper drain, which leads to pod ungraceful shutdown.Fixes/Improvements
QPS
from 5 to 100,Burst
from 10 to 100Now we are able to process 110 nodes per minute