No withdraw message sent to iBGP Peers when original route is deleted

ffilippopoulos commented 5 years ago

We have a kubernetes cluster running calico with mesh mode disabled, meaning that we have a number of nodes running bird bgp instances that are not communicating with each other. Each bird instance is advertising only 1 route. We configured all of them to use 3 Route Reflectors (running gobgp :master) that are seen as global peers. That way we create 3 separate bgp clusters that each one contains 1 gobgp RR server and the same clients (we use different cluster id per cluster - same as the rr's ip address). The bgp-rr clients discovery is dynamic (using dynamic neighbors configuration). All bgp instances are using the same AS number. On that setup everything works fine, we can delete/update nodes from our cluster and we see the respective change in the global rib almost immediately.

Now, we also want to create a full mesh between our gobgp RR instances to be to cover situations where gobgp instances are not able to talk to all nodes from the cluster. So our configuration on each of the gobgp nodes looks like:

[global.config]
  as = 64512
  router-id = "10.22.22.6"

[[peer-groups]]
  [peer-groups.config]
    peer-group-name = "k8s-bgp-clients"
    peer-as = 64512
  [peer-groups.route-reflector.config]
    route-reflector-client = true
    route-reflector-cluster-id = "10.22.22.6"
[[dynamic-neighbors]]
  [dynamic-neighbors.config]
    prefix = "10.22.22.0/24"
    peer-group = "k8s-bgp-clients"

[[neighbors]]
  [neighbors.config]
    peer-as = 64512
    neighbor-address = "10.22.22.69"
[[neighbors]]
  [neighbors.config]
    peer-as = 64512
    neighbor-address = "10.22.22.133"

with differences on the ip addresses as mentioned above and where the static neighbors are the other 2 gobgp instances. Using that setup we see every route to be injected in the global rib 3 times (which is expected). The issue is that when a node from the cluster (and the route it was advertising through it) goes away, gobgp servers still retain it in their global rib. We see a withdraw message but seems like there are updates from the other gobgp instances to put that route back in the rib. This is what we see from debug logs:

{"Key":"10.4.0.0/24","Topic":"Table","level":"debug","msg":"Removing withdrawals","time":"2018-10-02T13:05:20Z"}
{"Topic":"Table","level":"debug","msg":"computeKnownBestPath knownPathList: 1","time":"2018-10-02T13:05:20Z"}
{"Data":{"nlri":{"prefix":"10.4.0.0/24"},"attrs":[{"type":1,"value":0},{"type":2,"as_paths":null},{"type":3,"nexthop":"10.22.22.15"},{"type":5,"value":100}],"age":1538485269,"validation":"none","source-id":"10.2
2.22.133","neighbor-ip":"10.22.22.133"},"Key":"10.66.23.69","Topic":"Peer","level":"debug","msg":"From same AS, ignore.","time":"2018-10-02T13:05:20Z"}
{"Data":{"nlri":{"prefix":"10.4.0.0/24"},"attrs":[{"type":1,"value":0},{"type":2,"as_paths":null},{"type":3,"nexthop":"10.22.22.15"},{"type":5,"value":100}],"age":1538485269,"validation":"none","source-id":"10.2
2.22.133","neighbor-ip":"10.22.22.133"},"Key":"10.66.23.133","Topic":"Peer","level":"debug","msg":"From same AS, ignore.","time":"2018-10-02T13:05:20Z"}
{"Data":{"nlri":{"prefix":"10.4.0.0/24"},"attrs":[{"type":1,"value":0},{"type":2,"as_paths":null},{"type":3,"nexthop":"10.22.22.15"},{"type":5,"value":100}],"age":1538485269,"validation":"none","source-id":"10.2
2.22.133","neighbor-ip":"10.22.22.133"},"Key":"10.66.23.6","Topic":"Peer","level":"debug","msg":"From same AS, ignore.","time":"2018-10-02T13:05:20Z"}

We would expect, since the originator of that route does not exist any more, this route to go away at some point. This feels like a bug, but since I am no expert on bgp I wanted to ask if this is the expected behaviour or if there is some configuration that might cause this behaviour.

Thanks!

ffilippopoulos commented 5 years ago

https://www.juniper.net/documentation/en_US/junos/topics/concept/bgp-ibgp-understanding.html

prevent an IBGP peer from advertising an IBGP-learned route within the AS.

To me it looks like this ^^ part of iBGP functionality is not working properly.

ffilippopoulos commented 5 years ago

To give a bit more context after digging further, following the setup described above we see that for every route we get 3 entries in the global rib:

*> 10.4.11.0/24         10.22.22.8                                00:15:36   [{Origin: i} {LocalPref: 100}]
*  10.4.11.0/24         10.22.22.8                                00:20:17   [{Origin: i} {LocalPref: 100}]
*  10.4.11.0/24         10.22.22.8                                00:20:18   [{Origin: i} {LocalPref: 100}]

one should directly be from the node (in this case 10.22.22.8) and the other 2 should be from the 2 additional route reflectors. Now if we delete the node that advertises the route we see on each gobgp instance the following:

{"Key":"10.4.11.0/24","Topic":"Table","level":"debug","msg":"Removing withdrawals","time":"2018-10-05T15:11:00Z"}
{"Topic":"Table","level":"debug","msg":"computeKnownBestPath knownPathList: 2","time":"2018-10-05T15:11:00Z"}
{"Topic":"Table","level":"debug","msg":"enter compareByReachableNexthop -- path1: { 10.4.11.0/24 | src: { 10.22.22.133 | as: 64512, id: 10.22.22.133 }, nh: 10.22.22.8 }, path2: { 10.4.11.0/24 | src: { 10.22.22.6
9 | as: 64512, id: 10.22.22.69 }, nh: 10.22.22.8 }","time":"2018-10-05T15:11:00Z"}

so it looks like the original route from 10.22.22.8 is withdrawn but then gobgp calculates another best route based on the entries that it has in the rib and never sends a withdraw update for what it just missed to the other 2. The same thing happens on all 3 nodes and that is why we have a leftover route there that is no longer valid.

osrg / gobgp

No withdraw message sent to iBGP Peers when original route is deleted #1842