Closed AMacedoP closed 8 months ago
Have similar problem on Azure Operator Nexus (AON), not using route-reflectors, with /32 loadBalancer ip, and using externalTrafficPolicy: Local, if service pod restarted, there is no announce of the loadBalancer ip, traffic can only restored when client detected tcp retransmit time out ( about 15 minutes ). If use metallb, there is announcement in the same use case.
Hi @mgleung @sridhartigera, is there a fix planned to solve this issue? If not we could try solvivng it but we need pointers on where to find the error in the source code
I think the first thing to identify would be why the route isn't being advertised by the route reflector.
I am guessing that this is because the filters we create in the BIRD configuration aren't tuned correctly for the RR case, where the RR needs to allow export of routes that it learned from other speakers even if it doesn't have the route locally.
Suspect the fix is either in the route logic here: https://github.com/projectcalico/calico/blob/master/confd/pkg/backends/calico/routes.go
Or in the BIRD configuration files: https://github.com/projectcalico/calico/tree/master/confd/etc/calico/confd/templates
Or, both. Depending on what the simplest / most elegant solution seems to be :grin:
I recreated my lab and indeed the problem is in the filters bird uses to export bgp routes.
In the calico_export_to_bgp_peers
filter in /etc/calico/confd/config/bird_ipam.cfg
the route for the externalTrafficPolicy: Local
service is not present. When I manually add it to the RR node, bird announces it without problems.
I'll try and send a PR to solve it
@caseydavenport I've sent a PR, can you review it please?
@AMacedoP yep, I saw it and will take a look soon. It may be a few days as the holidays are a bit hectic and folks are taking time off.
When announcing LoadBalancer services using BGP and route-reflectors, services using
externalTrafficPolicy: Local
are not announced if the pods are not running in the same nodes as the route-reflectors.This looks like a edge case not covered in #6074 because we are also using /32 prefixes in BGPConfiguration.
Expected Behavior
LoadBalancer service IPs with
externalTrafficPolicy: Local
are announced by the route-reflectors to external network devicesCurrent Behavior
LoadBalancer service IPs are not announced by the route-reflectors, unless the route-reflector node has a pod that matches the service selectors
Possible Solution
Route reflectors nodes should announce all service IPs with
externalTrafficPolicy: Local
regardless of whether a pod is scheduled thereSteps to Reproduce (for bugs)
externalTrafficPolicy: Local
and check that the pod was scheduled to a node not configured as a route reflectorYour Environment