osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.64k stars 694 forks source link

nexthop is fictitious after received Route Refresh message #1891

Closed chenhaiq closed 5 years ago

chenhaiq commented 5 years ago

I have a gobgpd connected to a H3C switch with evpn. It was good in the first 3 days, but the output became this:

# gobgp global rib -a evpn
*> [type:multicast][rd:1:14173143][etag:0][ip:198.18.1.17]                                             fictitious                                4d 20:58:56   []
*> [type:macadv][rd:1:14173143][etag:0][mac:52:54:df:1e:f0:f7][ip:192.168.128.254]  [14173143,16008786]  fictitious                                4d 20:58:56  [[ESI: single-homed]]

Those items are advised from H3C:

 Route distinguisher: 1:14173143
 Total number of routes: 2

     Network            NextHop         MED        LocPrf             Path/Ogn

* >  [2][0][48][5254-df1e-f0f7][32][192.168.128.254]/136
                        0.0.0.0         0          100                i
* >  [3][0][32][198.18.1.17]/80
                        0.0.0.0         0          100                i

After I refresh bgp in H3C via: refresh bgp 172.31.20.67 export l2vpn evpn

The gobgp output is correct:

# gobgp n 198.18.1.17 adj-in -a evpn | grep 14173143
   0   [type:macadv][rd:1:14173143][etag:0][mac:52:54:df:1e:f0:f7][ip:192.168.128.254] [14173143,16008786] 198.18.1.17                                        4d 20:58:53 [{Origin: i} {Med: 0} {LocalPref: 100} {Extcomms: [1:16008786], [VXLAN], [default-gateway], [router's mac: b0:f9:63:b1:a0:00]} [ESI: single-homed]]
   0   [type:multicast][rd:1:14173143][etag:0][ip:198.18.1.17]                                             198.18.1.17                                        4d 20:58:56 [{Origin: i} {Med: 0} {LocalPref: 100} {Extcomms: [VXLAN]} {Pmsi: type: ingress-repl, label: 14173143, tunnel-id: 198.18.1.17}]

I have seen this error few times, no idea how to reproduce it yet.

chenhaiq commented 5 years ago

this problem happened after switch sent a bgp route-refresh message. It can be reproduced using this cmd in H3C switch: refresh bgp 172.31.20.67 import l2vpn evpn

chenhaiq commented 5 years ago

This is the packet that triggered this error:

14:45:00.314331 58:69:6c:c6:da:8e > f0:18:98:55:63:40, ethertype IPv4 (0x0800), length 89: (tos 0xc0, ttl 252, id 64764, offset 0, flags [none], proto TCP (6), length 75)
    198.18.1.17.179 > 172.31.107.89.54644: Flags [P.], cksum 0x0872 (correct), seq 3766158781:3766158804, ack 1289868836, win 4163, options [nop,nop,TS val 2105552590 ecr 926898241], length 23: BGP
    Route Refresh Message (5), length: 23
      AFI VPLS (25), SAFI Unknown (70)
      0x0000:  ffff ffff ffff ffff ffff ffff ffff ffff
      0x0010:  0017 0500 1900 46
14:45:00.321966 58:69:6c:c6:da:8e > f0:18:98:55:63:40, ethertype IPv4 (0x0800), length 66: (tos 0xc0, ttl 252, id 64766, offset 0, flags [none], proto TCP (6), length 52)
    198.18.1.17.179 > 172.31.107.89.54644: Flags [.], cksum 0x5be2 (correct), seq 23, ack 72, win 4154, options [nop,nop,TS val 2105552595 ecr 926902445], length 0
14:45:00.323994 58:69:6c:c6:da:8e > f0:18:98:55:63:40, ethertype IPv4 (0x0800), length 66: (tos 0xc0, ttl 252, id 64767, offset 0, flags [none], proto TCP (6), length 52)
    198.18.1.17.179 > 172.31.107.89.54644: Flags [.], cksum 0x5ba4 (correct), seq 23, ack 137, win 4148, options [nop,nop,TS val 2105552598 ecr 926902445], length 0
chenhaiq commented 5 years ago
func (peer *Peer) handleRouteRefresh(e *FsmMsg) []*table.Path {
...
    accepted, filtered := peer.getBestFromLocal(rfList)
    for _, path := range filtered {
        path.IsWithdraw = true <-- this line results the problem. Is it ok to remove this line?
        accepted = append(accepted, path)
    }
    return accepted
}
fujita commented 5 years ago

@chenhaiq thanks for the bug report. Do you configure a policy that drops the path? I don't understand that why the path is added to the filtered list.

fujita commented 5 years ago

@chenhaiq Can you try a fix?

https://github.com/osrg/gobgp/pull/1896

fujita commented 5 years ago

@chenhaiq I pushed the fix. Please try the latest master.

chenhaiq commented 5 years ago

@fujita It fixed the 'fictitious' error, but the route refresh message is not handled correctly, because it sends all data to peer including data from peer.

I'd like to remove

    for _, path := range filtered {
        accepted = append(accepted, path.Clone(true))
    }

because accepted is data not from the peer. filtered is data from the peer.

There is no policy in my configure file.

fujita commented 5 years ago

yeah, seems that handlerouterefresh and softresetout were broken at some point. I've just pushed a fix.

chenhaiq commented 5 years ago

it works!