In very large clusters with many target groups, we may want to 'batch' terminations so that we do not work on more than N instances at the same time.
When we get a hook we should start the worker that sends heartbeat, but possibly queue it until a slot is available.
For example,
With 500+ target groups, and 20 concurrent terminations, lifecycle-manager might have a hard time making progress, however if we only work on 5 instances at a time, performance of overall systems might be better.
We should take a controller flag --deregister-max-parallel and hold-off on deregistering any instance over N in parallel
In very large clusters with many target groups, we may want to 'batch' terminations so that we do not work on more than N instances at the same time.
When we get a hook we should start the worker that sends heartbeat, but possibly queue it until a slot is available.
For example, With 500+ target groups, and 20 concurrent terminations, lifecycle-manager might have a hard time making progress, however if we only work on 5 instances at a time, performance of overall systems might be better.
We should take a controller flag
--deregister-max-parallel
and hold-off on deregistering any instance over N in parallel