Open jack-at-circle opened 2 months ago
FWIW I sketched one possible solution to this here https://github.com/kubernetes-sigs/aws-load-balancer-controller/compare/main...jack-at-circle:feat-weighted-routing-failover It's not feature flagged or unit tested, but it achieves the basic failover mechanism I was hoping for here
Hi, thanks for the feedback! We are sharing with our teammates for more input
Background
I've been using the alb-controller for a long time in production to manage my EKS workloads. Recently my team started building some of our services as lambdas as well and we're looking for a way to implement a routing approach that allows scaling our EKS deployments to 0 and relying on lambda when traffic is low and scaling EKS up (and disabling lambda) when traffic is higher. The scaling within EKS is already supported by various tools such as KEDA.
Is your feature request related to a problem?
Currently when there are no healthy hosts in a target group, traffic is still routed to that group.
Describe the solution you'd like
A rough outline of what I'm thinking would be something like this:
The alb-controller already tracks the number of pods in a deployment while updating a target group and already supports non-eks target groups in the
forwardConfig
. It should be possible using the information already available to the alb-controller to set the target-group weight to 0 if the group contains no healthy hosts, and to set the weight of another (failover) target-group to be non zero when there are no other target groups being routed to.Describe alternatives you've considered
I've considered writing a script / job / listener that also monitors the number of pods available in a group and updates the annotation, but this seems like an indirect approach that could easily break.