sonic-net / sonic-swss

SONiC Switch State Service (SwSS)
https://azure.github.io/SONiC
Other
170 stars 503 forks source link

[vxlanmgr]: Add disabling of fdb learning for linux vxlan interfaces #3205

Closed yfedoriachenko closed 1 month ago

yfedoriachenko commented 1 month ago

What I did When vxlan tunnels are supposed to have fdb learning disabled, I also disabled fdb learning on vxlan linux interfaces

Why I did it For EVPN feature the learing on vxlan interfaces should be disabled, but it is not disabled on vxlan linux interfaces, and thus causes trouble when testing sonic-vs EVPN setups on virtual machines.

How I verified it With existing vxlan tunnel create new vlan and vxlan map, then verify that vxlan interface has fdb learning off. Save config, reboot the device and verify that vxlan interface still has learning off.

yaroslav_fedoriachenko@Leaf-1:~$ config vlan add 200
Root privileges are required for this operation
yaroslav_fedoriachenko@Leaf-1:~$ sudo config vlan add 200
yaroslav_fedoriachenko@Leaf-1:~$ sudo config vxlan map add vtep 200 2000
yaroslav_fedoriachenko@Leaf-1:~$ ip -d l show vtep-200
30: vtep-200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master Bridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0c:5d:d9:3b:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535 
    vxlan id 2000 local 10.127.127.1 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8005 port_no 0x5 designated_port 32773 designated_cost 0 designated_bridge 8000.c:5d:d9:3b:0:0 designated_root 8000.c:5d:d9:3b:0:0 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    1.43 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
yaroslav_fedoriachenko@Leaf-1:~$ sudo config save -y
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json
yaroslav_fedoriachenko@Leaf-1:~$ sudo reboot

...

yaroslav_fedoriachenko@Leaf-1:~$ ip -d l show vtep-200
29: vtep-200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master Bridge state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0c:5d:d9:3b:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535 
    vxlan id 2000 local 10.127.127.1 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8005 port_no 0x5 designated_port 32773 designated_cost 0 designated_bridge 8000.c:5d:d9:3b:0:0 designated_root 8000.c:5d:d9:3b:0:0 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.58 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
linux-foundation-easycla[bot] commented 1 month ago

CLA Signed

The committers listed above are authorized under a signed CLA.

dgsudharsan commented 1 month ago

@yfedoriachenko Have you tested your changes on a hardware platform and ensured basic EVPN works fine?

yfedoriachenko commented 1 month ago

@yfedoriachenko Have you tested your changes on a hardware platform and ensured basic EVPN works fine?

I don't have a hardware platform available, so I have not tested the changes on a hardware platform. But I think the changes to the linux interface should not affect hardware platforms, since those changes are not propagated to orchagent.

Also FRR EVPN doc suggests it should be disabled:

Dynamic MAC/VTEP learning should be disabled on VXLAN interfaces used in EVPN. Dynamic MAC learning is a function of the kernel bridge driver, not FRR. Dynamic MAC learning is toggled per bridge_slave via learning {on|off}.

https://docs.frrouting.org/en/latest/evpn.html#linux-interface-configuration

VladimirKuk commented 1 month ago

@yfedoriachenko Is this change really required ? Currently vxlan interface is created with learning disabled (regardless of EVPN configuration) and there is no learning enabling on vxlan interfaces. I've used your configuration and learning stays disabled also after reload:

root@sonic:~# config vxlan map add tunnel1 200 2000 root@sonic:~# ip -d l show tunnel1-200 42: tunnel1-200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master Bridge state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 22:3b:81:2b:2e:6f brd ff:ff:ff:ff:ff:ff promiscuity 1 allmulti 1 minmtu 68 maxmtu 65535 vxlan id 2000 local 1.1.1.2 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.22:3b:81:2b:2e:6f designated_root 8000.22:3b:81:2b:2e:6f hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.19 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 root@sonic:~# config save -y Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json root@sonic:~# root@sonic:~# reboot .... oot@sonic:~# ip -d l show tunnel1-200 16: tunnel1-200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master Bridge state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 22:3b:81:2b:2e:6f brd ff:ff:ff:ff:ff:ff promiscuity 1 allmulti 1 minmtu 68 maxmtu 65535 vxlan id 2000 local 1.1.1.2 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.22:3b:81:2b:2e:6f designated_root 8000.22:3b:81:2b:2e:6f hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

Do you have additional changes in the code or I'm missing something in the configuration ?

yfedoriachenko commented 1 month ago

bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.22:3b:81:2b:2e:6f designated_root 8000.22:3b:81:2b:2e:6f hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.19 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

@VladimirKuk I was talking about fdb learning (in bold in the quote above), not about the vxlan learning (in bold in the quote below)

vxlan id 2000 local 1.1.1.2 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx

See FRR docs https://docs.frrouting.org/en/latest/evpn.html#linux-interface-configuration. There the docs say "Dynamic MAC/VTEP learning should be disabled on VXLAN interfaces used in EVPN." and the subsections clarify which one is MAC (a.k.a. fdb) learning and which on is VTEP learning.

As for the why this is needed at all (besides the suggestions in FRR docs):

  1. it is relevant for vs, since linux interface is in data plane role
  2. in "regular" EVPN setups you might not notice a difference since the fdb entry created by FRR are just replaced with similar entries created by learning. But the difference becomes obvious in EVPN MH setups, where a BUM packet egressing from VTEP in ES would arrive at "another" VTEP in the same ES (which is intended). But would cause that "another" VTEP to relearn the MAC from vxlan tunnel instead of PortChannel, causing traffic TO the aforementioned MAC arriving at "another" VTEP to be thrown into the tunnel instead of PortChannel. I know the EVPN MH is not supported yet, but the basic EVPN already requires it (even if the consequences are not as dire)
  3. The fdb learning is already disabled in SAI (as well as vxlan learning, but that one is taken care off in linux vxlan interface as you pointed out)
VladimirKuk commented 1 month ago

@yfedoriachenko So, "nolearning" on the vxlan netdev is for VTEP learning not the MAC, I missed that part. You are correct that when BGP-EVPN is used as control plane, learning must be disabled. Just one suggestion, when EVPN NVO is removed, learning should be restored to previous value, so it should be enabled again, right?

yfedoriachenko commented 1 month ago

@VladimirKuk I looked into re-enabling learning. I remembered that when I had worked on it previously I noticed that it is not required to disable learning. While tunnel maps exist, evpn nvo can't be deleted, and if no tunnel maps exist, then there are no linux vxlan interfaces that need re-enabling