sapcc / octavia-f5-provider-driver

Apache License 2.0
18 stars 2 forks source link

Big-IP forwards traffic to members with unknown operational status #238

Open m-kratochvil opened 1 year ago

m-kratochvil commented 1 year ago

When new pool member is created and added to a pool with a health-check monitor (pool-based or member-based), the Big-IP sets its operational status to "unknown" and begins sending monitor probes to it to determine its actual health. During this time (the timeout period of the health monitor), Big-IP forwards traffic to the pool member. This is a core Big-IP behavior/design, the logic here is that under "normal" circumstances, it is not expected users would put offline members to a production loadbalancing pool.

This causes an issue (in form of certain level of packet loss) for pools created out of Gardener/Kubernikus when "ExternalTrafficPolicy: Local" is used, because only the pool members (k8s nodes) that hold the respective application pods are able to respond to client requests. In short, Kubernetes expects the loadbalancer to treat "unknown" status members as offline, Big-IP treats them as online/available, until the health-check determines the actual status.

This is apparently a known issue in the loadbalancing community, I have found some references, for example in the "F5 BIG-IP CIS" repository, e.g.: https://github.com/F5Networks/k8s-bigip-ctlr/issues/901#issuecomment-695045882

I created an F5 case: 00471736 There is an existing RFE #496733 for option to reverse this core Big-IP behavior. This RFE is being pushed forward by the SAP F5 account team.

Slack thread: https://convergedcloud.slack.com/archives/CSP5GMKD1/p1678298106541499

I have offered few possible workarounds in the above Slack thread but none were so far considered feasible.

m-kratochvil commented 11 months ago

No concrete update on this yet. The RFE #496733 is now the main tracking reference. It is active in the F5 PD (product development) queue and under evaluation. I'll check the status on regular basis.