Closed astone282 closed 6 years ago
Forgot to mention this is with v1.33.
Thanks!
Just want to add more comments on this, and I am hitting about the same behavior. I guess the reason is, when GoBGP receives the echo from RR, it programs it RIB anyway even it should be ignored. The moment when you delete the route, the "echo route" immediately becomes the best, so GoBGP won't consider a route is "deleted", as the "echo route" is the best route now.
According to RFC4456, section 8: "A router that recognizes the ORIGINATOR_ID attribute SHOULD ignore a route received with its BGP Identifier as the ORIGINATOR_ID." Seems GoBGP is doing that.
Interestingly, I do see this part of code in the source code, when GoBGP selects the best route. However, my point is, this route should never be programmed into RIB, draft code fore review:
@yicwang thanks for the comment. Other BGP implementations works in the same way (i.e., echo routes never be inserted into the RIB)?
@fujita To be honest, I am not a BGP expert. We are using GoBGP EVPN to handle our VxLAN control plane protocol, and testing against a Cisco ASR9K. Just did a quick test, when GoBGP advertised the route with OriginatorID as the ASR9K's BGP update-source interface IP, I am not able to see it from the ASR9K rib table. Personally I believe it should be safe to not program the route into RIB.
PS: Just update the PR to address your review comments.
@yicwang pushed, thanks a lot.
@astone282 before investigating a bug, RR is expected to echo's back the route? if so, GoBGP with RR setup doesn't so you reported another bug.
Hi @fujita @yicwang thanks for looking at this + the additional comments + PR.
@fujita I'm not a BGP expert either, but this is what the RFC 4456 section 6 describes: :
" When an RR receives a route from an IBGP peer, it selects the best path based on its path selection rule. After the best path is selected, it must do the following depending on the type of peer it is receiving the best path from
1) A route from a Non-Client IBGP peer: Reflect to all the Clients. 2) A route from a Client peer: Reflect to all the Non-Client peers and also to the Client peers. (Hence the Client peers are not required to be fully meshed.)
"
Therefore I would interpret "All the clients" to include peer which originated the route. To prevent the loops, ORIGINATOR_ID should contain the original advertiser within that AS and CLUSTER_LIST must contain / be pre-pended with the RR cluster ID.
Hope this helps Andrew
@fujita I think what aston282 originally reported maybe still is a bug, but it should no longer happen after my PR. I did the same test before on my EVPN use case. With the route reflected from RR, GoBGP RIB will have duplicated entries for the same information. One is the originated from local with the original information, one is echoed from RR with the same information, but additional attribute of "OriginatorID==GoBGP IP, and ClusterList==[RR IP]".
The bug I believe astone282 complains is, when we delete the original route, GoBGP never sends out the route withdraw message out, as it believes the echoed route is going to be the best now. I confirmed that too by doing tcpdump, and that was how I found out the problem and proposed the fix. So @astone282, I guess we can close this issue, as you shouldn't see the behavior any more with my PR. Try and see?
Correct me when wrong.
Regards, Yichen
Yep, agreed @yicwang . Thanks! will grab master sometime next week, build and give it a go. Thanks again.
@yicwang I have a similar case that is not covered by this PR. In my scenario I have 2 (or more) route reflectors that have the same clients but with with different cluster ids. These 2 rrs are connected (iBGP peers) so I end up having 2 identical entries for every route in the RIB. When one of the original routes gets deleted the route reflectors never exchange any withdraw message between each other and still maintain 1 entry for the route that is now dead. More details here: https://github.com/osrg/gobgp/issues/1842. I think that this is a valid bgp setup. Any suggestions? The PR linked above does not solve this one since it prevents echo messages from being added into the RIB which is different from my case.
@ffilippopoulos I believe the reason why you need 3 GoBGP instances are for HA purpose. Can you try to set all three 3 GoBGP to have the same Router ID? They are essentially having the same information, which should logically make sense to do that... If they are have the same Router ID, the issue should go away with this PR.
@yicwang yes that was made for HA purposes. I believe that if they all share the same id and after your PR, then they will reject all routes exchanged between them. So there is no point peering them in the first place.
@ffilippopoulos Please correct me when wrong. Say you have 3 GoBGP instances running as RR, and 5 bird instances. By creating a mesh, all your 5 bird instances are connecting to 3 GoBGP instances, so total of 15 connections are there. Note that you don't really need to connect 3 GoBGP instances in a mesh, as they are all going to have the same information from 5 BIRD instances, and they are in Acitve-Active-Active HA already, right?
yes, so my setup is exactly how you described it, 3 GoBGP instances (lets say RR1, RR2, RR3) and 5 bird instances (lets say B1, B2, B3, B4, B5). Every RR is connected with all bird instances so 15 connections in total. The reason for trying to peer RRs between each other is for example if the connections RR1-B1, RR2-B5 and RR3-B5 are all down there is no way for B1 to find B5 routes (the only way will be B5->RR2/RR3->RR1->B1). If all RRs have the same id and reject incoming routes based on that then there is no way to create a resilient setup like the above. BTW, thanks for all the help @yicwang
Hello,
Doing some experimentation. I have gobgp configured (54.54.54.54) to talk to a vendor route reflector (35.121.146.13). Using the gobgp CLI or Java GRPC, when I addPath (ipv4), goBgp correctly sends the route and it is received on the RR.
The RR then echo's back the route with the RR cluster ID back to gobgp. I can see this route come in via monitor and also in the global rib dump.
When I try to deletePath, via CLI or Java GRPC, i do NOT see the [DELROUTE] log message. The entry is deleted from the gobgp but a withdraw message is not received on the RR. Therefore the route is not withdrawn.
Note: if i disable route-reflector on the other node, gobgp can successful add and withdraw the route. My guess is gobgp is not withdrawing the route it generated due to a collision with the reflected route
Here's a step through of what's happening: