sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
723 stars 1.38k forks source link

[EVPN@Scale] Poor performance and stability on EVPN L2 scale scenario #15004

Open Hedgehog-Guru opened 1 year ago

Hedgehog-Guru commented 1 year ago

Description

Poor performance and stability on EVPN L2 scale scenario

Steps to reproduce the issue:

  1. On two switches configure L2 EVPN with single vlan and single VNI
  2. On both switches add two L2 untagged ports to the vlan
  3. Run unicast L2 traffic between switches (one port to one port): 3.a. SW2 -> SW1: NNNN DMAC to 1 SMAC 3.b. SW1 -> SW2: 1 SMAC to NNNN DMAC
  4. On SW1 check number of remote MACs and number of EVPN prefixes.
  5. On SW1 check CPU load
  6. On SW1 and SW2 check number of unknown unicast flooded frames on second ports
  7. Do it for a long time (20 mins for example) to check stability and no flooding

Describe the results you received:

Slow convergency time: On Spectrum-3 is takes 4 min to install 132K remote MACs High CPU utilization mainly on redis-server Each 5 mins number of MACs decreased (Linux bridge aging?) Almost constantly - unknown unicast flooding to "monitor" ports

Describe the results you expected:

Faster convergence and stability

Output of show version:

SONiC Software Version: SONiC.202211_RC12.1-4ee027200_Internal
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: 4ee027200
Build date: Thu May  4 11:06:45 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci02-242

Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1932X22252
Model Number: MSN3700-VS2F
Hardware Revision: A1
Uptime: 17:51:26 up  1:27,  1 user,  load average: 0.13, 0.17, 0.25
Date: Wed 10 May 2023 17:51:26

Output of show techsupport:

SW1 - DUT sonic_dump_qa-eth-vt03-3-4600ca1_20230510_164945 (1).zip sonic_dump_qa-eth-vt03-3-4600ca1_20230510_164945 (2).zip sonic_dump_qa-eth-vt03-3-4600ca1_20230510_164945 (3).zip

SW2 sonic_dump_qa-eth-vt03-4-3700v_20230510_164947 (1).zip sonic_dump_qa-eth-vt03-4-3700v_20230510_164947 (2).zip sonic_dump_qa-eth-vt03-4-3700v_20230510_164947 (3).zip

Additional information you deem important (e.g. issue happens only occasionally):

Stats collected stats-and-script.zip

arlakshm commented 1 year ago

@dgsudharsan to add more issue to better reflect the issue.

adyeung commented 1 year ago

@dgsudharsan as discussed in the call 5/24 pls share more data on github

@srj102 according to Sudharshan the kernel unexpected expiration of MAC is leading to BGP route withdrawal, it's seen with low scale also, pls help followup and share your analysis

dgsudharsan commented 1 year ago

@adyeung @srj102 . This mac oscillation happens even with 1 mac address. When a local mac is learnt, it is advertised to a remote vtep. Later when the kernel ages out the local mac, the notification is received by both BGP and fdbsyncd. BGP immediately withdraws the mac from the remote. FDBsyncd programs the mac back to kernel which then retriggers BGP to advertise it back again. So for a window from about BGP withdrawal to re advertisement, the traffic to this mac will be flooded in the topology.

This issue will exacerbate when the scale of the mac increases, as it increases the window for a mac due to processing overload at BGP and fdbsyncd

srj102 commented 1 year ago

@hasan-brcm can you please comment on this.

dgsudharsan commented 1 year ago

@hasan-brcm @adyeung Can you please provide ETA for fixing this issue?

hasan-brcm commented 1 year ago

Later when the kernel ages out the local mac, the notification is received by both BGP and fdbsyncd. BGP immediately withdraws the mac from the remote..

Hi @dgsudharsan, mac addresses are installed as extern_learn and these should not be aging out. https://elixir.bootlin.com/linux/v5.10.190/source/net/bridge/br_fdb.c#L79 https://elixir.bootlin.com/linux/v5.10.190/source/net/bridge/br_fdb.c#L355

00:02:00:00:86:e2 dev VTEP1-100 vlan 100 extern_learn master Bridge 00:02:00:00:7d:f6 dev VTEP1-100 vlan 100 extern_learn master Bridge

dgsudharsan commented 1 year ago

Later when the kernel ages out the local mac, the notification is received by both BGP and fdbsyncd. BGP immediately withdraws the mac from the remote..

Hi @dgsudharsan, mac addresses are installed as extern_learn and these should not be aging out. https://elixir.bootlin.com/linux/v5.10.190/source/net/bridge/br_fdb.c#L79 https://elixir.bootlin.com/linux/v5.10.190/source/net/bridge/br_fdb.c#L355

00:02:00:00:86:e2 dev VTEP1-100 vlan 100 extern_learn master Bridge 00:02:00:00:7d:f6 dev VTEP1-100 vlan 100 extern_learn master Bridge

Hi @hasan-brcm The aging out happens in the source VTEP and not in destination VTEP. When source VTEP kernel ages out the mac, the mac is withdrawn from by the BGP in source VTEP which removes the mac in destination VTEP. The fdbsyncd in source VTEP reprograms the MAC in the kernel which will then learnt again by destination VTEP through BGP and it is programmed. So for every 5 minutes all the remote macs are removed and reinstalled due to the kernel aging happening in source VTEP.