LoxiLB to choose the active endpoints always and ignore inactive endpoints

loxilb-io / loxilb

eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.

https://www.loxilb.io

Apache License 2.0

1.48k stars 122 forks source link

LoxiLB to choose the active endpoints always and ignore inactive endpoints #836

Closed Rammurthy5 closed 13 hours ago

Rammurthy5 commented 1 month ago

Is your feature request related to a problem? Please describe. Lets say we have two LoxiLB set up in two different regions. Each region has a LoxiLB master and assigned with a ElasticIP. both the ElasticIP are now listed under DNS record. when we access the service exposed by hitting the DNS, we could notice it tries all the access points.

Describe the solution you'd like Instead of accessing all the available IPs, it can access only the active ones.

Describe alternatives you've considered a workaround would need a manual intervention. for e.g. removing the inactive master's ElasticIP from the DNS record.

UPDATE on Oct 18th 2024: when the probetype is set to ping in loxi-svc yaml, liveness is enabled, when the internal probe is failing we see no ingress-manager (loxicmd get lb) in the nodes where it is failing. once we delete probetype: ping and redeploy the svc yaml, we could see ingress-manager show up in all the nodes

TrekkieCoder commented 1 month ago

In this context, additionally loxilb should also have some zonal affinity based on location of its endoints. When traffic goes out of the zone via VPN or peering, of course, the latency will go up. So, priority should be given to local endpoints when available. Or, some kind of weighted distribution might be good too.

JoEunil commented 1 month ago

To control access to LoxiLB instances located in different regions, it seems necessary to handle it via DNS. It appears that each region is being used in an Active-Standby configuration.

If you’re using AWS Route53, you can configure this by selecting the Failover option as the Routing Policy when adding records.

For more details, please refer to the AWS Route53 documentation.

TrekkieCoder commented 1 month ago

@JoEunil Thanks for chipping in. The goal would be to make sure the endpoints that loxilb has are active. And if not active, it stop sending traffic to it. loxilb already supports this via liveness probes but need further testing in this particular scenario.

However, if all loxilb nodes in a zone itself goes down we can configure route53 to temporarily disable resolution to this particular zone.

TrekkieCoder commented 2 weeks ago

There has been a fix to the related issue . As a result this should work as well. Although probe type "ping" is fine, it only checks node liveness and not the end pod liveness. It would be good to simply add "liveness" annotation which would probe the end-point at the given coordinates :

kind: Service
metadata:
  name: loxilb-ingress-manager
  namespace: kube-system
  annotations:
    loxilb.io/lbmode: "onearm"
    loxilb.io/liveness : "yes"
spec:
  externalTrafficPolicy: Local
  loadBalancerClass: loxilb.io/loxilb
  selector:
    app.kubernetes.io/instance: loxilb-ingress
    app.kubernetes.io/name: loxilb-ingress
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
    - name: https
      port: 443
      protocol: TCP
      targetPort: 443
  type: LoadBalancer

I would leave this open until final validation from the op - @Rammurthy5

Rammurthy5 commented 13 hours ago

Works as expected. thanks for the fix @TrekkieCoder