Closed ipspace closed 2 months ago
So, I found this comment in my code:
! This is an unnumbered eBGP session
! WTF Remote-AS configuration not supported for unnumbered peer
Let's see if this has been fixed with newer versions ;)
Otherwise I will declare a caveat.
dut(config)# router bgp 65000
dut(config-router-bgp-65000)# neighbor interface ethernet1/1/1
dut(config-router-neighbor)# rem
remote-as remove-private-as
dut(config-router-neighbor)# remote-as 123
% Error: Remote-AS configuration not supported for unnumbered peer
Again - WTF.
Btw, the flap seems not to be caused by the missing Remote AS (it seems OS10 accept whatever comes on the OPEN), but
Dell (OS10) %BGP_NBR_BKWD_STATE_CHG: Backward state change occurred ADJCHANGE: Session down for Nbr over:ethernet1/1/1 VRF:default
Dell (OS10) %BGP_NBR_BKWD_STATE_CHG: Backward state change occurred UPDATE ERR: Invalid nexthop recvd from Nbr over:ethernet1/1/1 VRF:default
I found an "interesting" paragraph on OS10 docs, saying:
Behavior of iBGP unnumbered with cumulus
By default, SmartFabric OS10 has next-hop-self configuration enabled for unnumbered peers under both IPv4 and IPv6 addressfamilies.
Routes that are sent to an iBGP unnumbered peer have Next Hop resolved with Next Hop length as 32. In Cumulus, IPv4 NLRI is
advertised with link-local Next Hop and Next Hop length as 16. IPv6 NLRI is advertised with Next Hop unchanged if you do not
configure next-hop-self; otherwise, with next-hop-self configured with link-local address the Next hop length as 16.
IPv4 NLRI with Next Hop length as 16 is accepted only if you enable the link-local-only-nexthop command for that
unnumbered peer. Otherwise, this results in an update error.
IPv6 NLRI with link-local address as Next Hop and length as 16 is accepted only if you enable the link-local-onlynexthop command for that unnumbered peer. Otherwise, this results in an update error.
It seems that enabling that option at the template level is solving the issue.
Testing it also on a topology OS10-to-OS10 and OS10-to-VyOS.
Now the regular EBGP unnumbered check is failing (bgp/06-unnumbered.yml). Will publish the test results once I manage to get DellOS10 to work at least once in each test ;)
bgp/06
was the one initially referenced by this issue, and the one I used for the first testing...
This on my env:
root@hippo:~/TOPOLOGIES/bugs/bgp06# netlab validate
[WARNING] Initial wait time extended by 30 seconds required by dellos10
[session] Check EBGP sessions with DUT (wait up to 30 seconds) [ node(s): x1,x2,x3 ]
[PASS] x1: Neighbor eth1 (dut) is in state Established
[PASS] x2: Neighbor eth1 (dut) is in state Established
[PASS] x3: Neighbor 172.16.0.1 (dut) is in state Established
[PASS] Test succeeded
[pfx_x2] Check whether DUT propagates the X2 prefix [ node(s): x1 ]
[PASS] x1: The prefix 172.42.42.0/24 is in the BGP table
[PASS] Test succeeded
[pfx_x3] Check whether DUT propagates the X3 prefix [ node(s): x1 ]
[PASS] x1: The prefix 172.42.43.0/24 is in the BGP table
[PASS] Test succeeded
[SUCCESS] Tests passed: 5
Looks like I forgot (yet again) to push new code to the test server. In totally unrelated news, the Dell OS10 SSH server failure rate is currently above 80% :(( I hate this crap...
The new test results are online: https://tests.netlab.tools/_html/dellos10-libvirt
BGP works, the only failed VRF test is the common services VRF using OSPF.
Great job, thank you!
The current configuration template does not configure neighbor AS number for interface EBGP session, resulting in flapping EBGP sessions and no route propagation.
Device configuration: