sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
200 stars 732 forks source link

bgp sessions down with cEOS #11279

Open Stephenxf opened 10 months ago

Stephenxf commented 10 months ago

Description I'm setting up a physical testbed with cEOS to simulate neighboring devices following these steps. Both add-topo and deploy-mg have been run successfully. The configs look good on DUT and cEOS containers.

However, the BGP sessions between the DUT and cEOS containers stay down, although ping works from either side. From one of the containers, the tcpdump output shows both sides are initiating TCP connection requests, but it seems the cEOS container is ignoring or dropping the TCP packets from DUT. When DUT starts with TCP SYN, no response is seen from container; when container initiates, DUT returns SYN ACK, but no ACK follows up from container.

07:28:53.421015 eth1  In  ifindex 5426 2c:dd:e9:c0:6f:1c ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 255, id 47279, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.0.50102 > 10.0.0.1.bgp: Flags [S], seq 3038400278, win 63420, options [mss 9060,sackOK,TS val 2662681490 ecr 0,nop,wscale 9], length 0
07:28:53.523220 eth1  Out ifindex 5426 96:b8:11:a0:a6:43 ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 64, id 29540, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.1.46593 > 10.0.0.0.bgp: Flags [S], seq 3373226361, win 64218, options [mss 9174,sackOK,TS val 2292974628 ecr 0,nop,wscale 13], length 0
07:28:53.523583 eth1  In  ifindex 5426 2c:dd:e9:c0:6f:1c ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.0.bgp > 10.0.0.1.46593: Flags [S.], seq 878187551, ack 3373226362, win 63336, options [mss 9060,sackOK,TS val 2662681592 ecr 2292974628,nop,wscale 9], length 0
07:28:54.445258 eth1  In  ifindex 5426 2c:dd:e9:c0:6f:1c ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 255, id 47280, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.0.50102 > 10.0.0.1.bgp: Flags [S], seq 3038400278, win 63420, options [mss 9060,sackOK,TS val 2662682514 ecr 0,nop,wscale 9], length 0
07:28:54.536862 eth1  In  ifindex 5426 2c:dd:e9:c0:6f:1c ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.0.bgp > 10.0.0.1.46593: Flags [S.], seq 878187551, ack 3373226362, win 63336, options [mss 9060,sackOK,TS val 2662682606 ecr 2292974628,nop,wscale 9], length 0
07:28:54.552766 eth1  Out ifindex 5426 96:b8:11:a0:a6:43 ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 64, id 29541, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.1.46593 > 10.0.0.0.bgp: Flags [S], seq 3373226361, win 64218, options [mss 9174,sackOK,TS val 2292975658 ecr 0,nop,wscale 13], length 0
07:28:54.552971 eth1  In  ifindex 5426 2c:dd:e9:c0:6f:1c ethertype IPv4 (0x0800), length 80: (tos 0xc0, ttl 255, id 0, offset 0, flags [DF], proto TCP (6), length 60)

Am I missing anything in the setup? Has anyone seen a similar issue with cEOS? Any input would be highly appreciated.

Steps to reproduce the issue:

  1. add-topo for t1 topo
  2. deploy-mg

Describe the results you received: BGP sessions between DUT and cEOS stay down, although ping works from both sides.

Describe the results you expected: BGP sessions between DUT and cEOS come up.

Additional information you deem important: I started with a lower version of cEOS. Now I'm running cEOS64-lab-4.30.4M.

ARISTA01T2#show version
Arista cEOSLab
Hardware version:
Serial number: 6B8C1AAE107F85E8A5BF1C7257158ECA
Hardware MAC address: 1656.7bf8.6cf5
System MAC address: 1656.7bf8.6cf5

Software image version: 4.30.4M-34191138.4304M (engineering build)
Architecture: x86_64
Internal build version: 4.30.4M-34191138.4304M
Internal build ID: 65d33dd5-6f77-48d3-86c8-15e1072f7664
Image format version: 1.0
Image optimization: None

cEOS tools version: (unknown)
Kernel version: 5.4.0-150-generic

Uptime: 29 minutes
Total memory: 65747828 kB
Free memory: 41148736 kB
Stephenxf commented 10 months ago

P.S.: this is not an ebgp multiple issue. I do have fanout switches in between, but adding "ebgp-multihop" command doesn't help. I tried converting the bgp sessions to ibgp, no luck either.

BTW, on the cEOS containers, the bgp sessions with exabgp are all UP as expected.