sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
723 stars 1.38k forks source link

ARP traffic gets dropped in VXLAN-EVPN tunnels #10050

Open lukasstockner opened 2 years ago

lukasstockner commented 2 years ago

Description

When configuring a L2 VXLAN-EVPN overlay on Celestica Seastone2 (Trident3-based) switches in a simple test setup, ARP packets don't reach the server on the other switch.

IPv4 unicast traffic works just fine after manually adding the ARP entries on the servers, IPv6 (including ND) works just fine out of the box, and other broadcast traffic (e.g. a simple ping to the broadcast address) also arrives, so the problem appears to be related to ARP itself, not BUM traffic in general.

Steps to reproduce the issue:

  1. Deploy a 202012 or 202106 image (containing the SAI config entries mentioned in this comment) on two Celestica Seastone2 switches, each of which is connected to one server on Ethernet128 and to the other switch on Ethernet0.
  2. Set up a basic L2VPN configuration:
    sudo config interface ip add Loopback0 1.1.1.1/32
    sudo config interface ip add Ethernet0 10.100.0.1/24
    sudo config vlan add 1000
    sudo config vlan member add -u 1000 Ethernet128
    sudo config vxlan add vtep 1.1.1.1
    sudo config vxlan evpn_nvo add nvo vtep
    sudo config vxlan map add vtep 1000 10000
  3. Configure BGP:
    configure terminal
    router bgp 65100
    bgp router-id 1.1.1.1
    no bgp ebgp-requires-policy
    bgp bestpath as-path multipath-relax
    neighbor 10.100.0.2 remote-as external
    !
    address-family ipv4 unicast
    network 1.1.1.1/32
    exit-address-family
    !
    address-family l2vpn evpn
    neighbor 10.100.0.2 activate
    advertise-all-vni
    exit-address-family
    end
    write
  4. Apply equivalent config on the other switch (adjusting IPs and ASN)
  5. Check that BGP session is up and L2VPN info is exchanged
  6. Assign IPv4 and IPv6 IPs to the two servers
  7. Check that pings go through

Describe the results you received:

IPv6 pings to link-local and manually configured addresses work just fine, while IPv4 pings fail due to not receiving an ARP reply. After adding a static ARP table entry on both servers, IPv4 pings also work just fine, and traffic between the servers flows at line rate.

Describe the results you expected:

Both IPv4 and IPv6 should work.

Output of show version:

SONiC Software Version: SONiC.202012.0-c57ae5f80
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: c57ae5f80
Build date: Mon Feb 21 18:41:12 UTC 2022
Built by: lstockner@gc231-ctrlv2-control-host

Platform: x86_64-cel_seastone_2-r0
HwSKU: Seastone_2
ASIC: broadcom
ASIC Count: 1
Serial Number: DX030F2B031A05UB200019
Uptime: 02:41:41 up 26 min,  1 user,  load average: 1.29, 1.31, 1.15

Docker images:
REPOSITORY                    TAG                  IMAGE ID            SIZE
docker-sonic-mgmt-framework   202012.0-c57ae5f80   79440cf48d41        797MB
docker-sonic-mgmt-framework   latest               79440cf48d41        797MB
docker-fpm-frr                202012.0-c57ae5f80   28481c6294d1        412MB
docker-fpm-frr                latest               28481c6294d1        412MB
docker-orchagent              202012.0-c57ae5f80   e535c2ffebac        412MB
docker-orchagent              latest               e535c2ffebac        412MB
docker-nat                    202012.0-c57ae5f80   a18fd90d0b31        397MB
docker-nat                    latest               a18fd90d0b31        397MB
docker-sflow                  202012.0-c57ae5f80   823b96247f40        395MB
docker-sflow                  latest               823b96247f40        395MB
docker-teamd                  202012.0-c57ae5f80   3229bb6a0885        394MB
docker-teamd                  latest               3229bb6a0885        394MB
docker-sonic-telemetry        202012.0-c57ae5f80   143e2eb4c7ad        472MB
docker-sonic-telemetry        latest               143e2eb4c7ad        472MB
docker-platform-monitor       202012.0-c57ae5f80   004a6a58b2df        564MB
docker-platform-monitor       latest               004a6a58b2df        564MB
docker-snmp                   202012.0-c57ae5f80   b707dba0b738        426MB
docker-snmp                   latest               b707dba0b738        426MB
docker-iccpd                  202012.0-c57ae5f80   fd2fff9a2f9f        394MB
docker-iccpd                  latest               fd2fff9a2f9f        394MB
docker-syncd-brcm             202012.0-c57ae5f80   657db22b6b53        675MB
docker-syncd-brcm             latest               657db22b6b53        675MB
docker-lldp                   202012.0-c57ae5f80   5dd3f98d8e09        423MB
docker-lldp                   latest               5dd3f98d8e09        423MB
docker-database               202012.0-c57ae5f80   35b7c2b9bcfd        382MB
docker-database               latest               35b7c2b9bcfd        382MB
docker-dhcp-relay             202012.0-c57ae5f80   8c1beba4a4a7        396MB
docker-dhcp-relay             latest               8c1beba4a4a7        396MB
docker-router-advertiser      202012.0-c57ae5f80   ab595b3854c6        383MB
docker-router-advertiser      latest               ab595b3854c6        383MB
docker-mux                    202012.0-c57ae5f80   00619d24125d        435MB
docker-mux                    latest               00619d24125d        435MB

Output of show techsupport:

sonic_dump_localhost_20220221_201100.tar.gz

Additional information you deem important (e.g. issue happens only occasionally):

Since a few other issues mention that L2VPN worked for them, I've tried several older versions going back to July 2021, but all of them had the same problem.

202111 and master have a different issue that prevents the tunnel from coming up at all, I'll create a separate issue for that.

The logs contain an error related to setting the port status - backporting https://github.com/Azure/sonic-swss/pull/2080 fixes this, but the ARP problem remains.

Giving the switches an IP on the VLAN makes the ARP requests in question show up on the Linux interfaces when using tcpdump, but there's no ARP reply to be seen.

From spamming ARP requests from one server and checking port counters, it appears that the destination switch receives the packets and drops them instead of decapping and sending them to the second server.

The image I'm running for the output above is based on 7a35504ff, with a few platform-related fixes that I still have to upstream. None of them should have any impact on the dataplane, it's just Python platform module stuff.

aseaudi commented 2 years ago

Could you do the following:

swssloglevel -l INFO -a config vxlan map delete vtep 1000 10000 tail -f /var/log/syslog & config vxlan map add vtep 1000 10000

and send the logs showing the vxlan tunnel being created

I am facing an issue here, might be related

lukasstockner commented 2 years ago

@aseaudi I'm currently testing other images on the switches, but I think the following section from the log in the techsupport dump is what you want:

Feb 21 20:09:59.931907 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_process_route_add_mode_host_only:733 SAI Enter _brcm_sai_mptnl_process_route_add_mode_host_only
Feb 21 20:09:59.931907 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_process_route_add_mode_host_only:779 SAI Exit _brcm_sai_mptnl_process_route_add_mode_host_only
Feb 21 20:09:59.935666 localhost WARNING swss#orchagent: :- createTunnelHw: creation src = 1
Feb 21 20:09:59.935775 localhost NOTICE swss#orchagent: :- create_tunnel: create_tunnel:encapmaplist[0]=0x29000000000613
Feb 21 20:09:59.935904 localhost NOTICE swss#orchagent: :- create_tunnel: create_tunnel:encapmaplist[1]=0x29000000000615
Feb 21 20:09:59.937556 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_create_tunnel:3049 SAI Enter brcm_sai_tnl_mp_create_tunnel
Feb 21 20:09:59.937556 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_create_tunnel:3138 Setting peer_mode to 0
Feb 21 20:09:59.937556 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_create_tunnel:3285 Created tunnel id: 2
Feb 21 20:09:59.940214 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_route_dst_tnl_cnt:911 SAI Enter _brcm_sai_mptnl_route_dst_tnl_cnt
Feb 21 20:09:59.940214 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:mptnl_xgs_flexflow_create_sipdip_tnl:1696 SDK dscp_mode(UNIFORM)
Feb 21 20:09:59.940214 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:mptnl_xgs_flexflow_create_sipdip_tnl:1710 SDK ttl_mode(UNIFORM)
Feb 21 20:09:59.940214 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:mptnl_xgs_flexflow_create_sipdip_tnl:1775 tunnel_id (1275068419) flags (0) valid_elements (909) dscp_sel (0x1) dscp (0)
Feb 21 20:09:59.941787 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_process_tnl_route_add_tunnel_event:646 SAI Enter _brcm_sai_mptnl_process_tnl_route_add_tunnel_event
Feb 21 20:09:59.941787 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_find_external_best_route:596 SAI Enter _brcm_sai_mptnl_find_external_best_route
Feb 21 20:09:59.941916 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_find_external_best_route:634 SAI Exit _brcm_sai_mptnl_find_external_best_route
Feb 21 20:09:59.941916 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_route_add_dip_tunnel:142 SAI Enter _brcm_sai_mptnl_route_add_dip_tunnel
Feb 21 20:09:59.941916 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_route_add_dip_tunnel:212 SAI Exit _brcm_sai_mptnl_route_add_dip_tunnel
Feb 21 20:09:59.942038 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_tnl_route_event_add:391 SAI Enter _brcm_sai_mptnl_tnl_route_event_add
Feb 21 20:09:59.943761 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_tnl_route_event_add:538 SAI Exit _brcm_sai_mptnl_tnl_route_event_add
Feb 21 20:09:59.943761 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:_brcm_sai_mptnl_process_tnl_route_add_tunnel_event:668 SAI Exit _brcm_sai_mptnl_process_tnl_route_add_tunnel_event
Feb 21 20:09:59.943761 localhost INFO syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_create_tunnel:3313 SAI Exit brcm_sai_tnl_mp_create_tunnel
Feb 21 20:09:59.944839 localhost NOTICE swss#orchagent: :- createDynamicDIPTunnel: Created P2P Tunnel remote IP 1.1.1.1 
Feb 21 20:09:59.944839 localhost NOTICE swss#orchagent: :- addTunnelUser: diprefcnt for remote 1.1.1.1 = 1
Feb 21 20:09:59.946139 localhost NOTICE swss#orchagent: :- addBridgePort: Add bridge port Port_EVPN_1.1.1.1 to default 1Q bridge
Feb 21 20:09:59.946297 localhost ERR swss#orchagent: :- meta_sai_on_port_state_change_single: data.port_id oid:0x2a00000000061a has unexpected type: SAI_OBJECT_TYPE_TUNNEL, expected PORT, BRIDGE_PORT or LAG
Feb 21 20:09:59.947895 localhost NOTICE swss#orchagent: :- addVlanMember: Add member Port_EVPN_1.1.1.1 to VLAN Vlan1000 vid:1000 pid0
Feb 21 20:09:59.947895 localhost ERR swss#orchagent: :- setPortPvid: pvid setting for tunnel Port_EVPN_1.1.1.1 is not allowed
Feb 21 20:09:59.948454 localhost INFO syncd#syncd: [none] SAI_API_FDB:_brcm_sai_fdb_table_add:51 fdbEvent:FDB table add: MAC:D8-5E-D3-84-76-7F  vfi 0x73e8 port:0x2a043a00000002 vlan:1000 is_static 0 is_remote 128
Feb 21 20:09:59.948454 localhost INFO syncd#syncd: [none] SAI_API_FDB:brcm_sai_create_fdb_entry:707 FDB Create: MAC:D8-5E-D3-84-76-7F port_tid:0xb0000003 port_type:Port vid:0x73e8
Feb 21 20:09:59.949356 localhost INFO syncd#syncd: [none] SAI_API_FDB:_brcm_sai_fdb_table_add:51 fdbEvent:FDB table add: MAC:D8-5E-D3-84-76-8F  vfi 0x73e8 port:0x2a043a00000002 vlan:1000 is_static 0 is_remote 128
Feb 21 20:09:59.949415 localhost INFO syncd#syncd: [none] SAI_API_FDB:brcm_sai_create_fdb_entry:707 FDB Create: MAC:D8-5E-D3-84-76-8F port_tid:0xb0000003 port_type:Port vid:0x73e8
Feb 21 20:09:59.949852 localhost NOTICE swss#orchagent: :- doTask: Get port state change notification id:2a00000000061a status:1
Feb 21 20:09:59.949988 localhost ERR swss#orchagent: :- doTask: Failed to get port object for port id 0x2a00000000061a

For me, the oper_down error could be fixed by applying https://github.com/Azure/sonic-swss/pull/2080, maybe also try that? The ARP problem still persists, though.

aseaudi commented 2 years ago

@lukasstockner you have the error: ERR swss#orchagent: :- meta_sai_on_port_state_change_single: data.port_id oid:0x2a00000000061a has unexpected type: SAI_OBJECT_TYPE_TUNNEL, expected PORT, BRIDGE_PORT or LAG what happens after that, does the VXLAN appear in the "show vxlan remotevtep" output ? what is the output of "bridge fdb show br Bridge" ? in my case, after the error, i get a log saying orchagent exiting, and the swss container restarts a couple of times and finally fails.

lukasstockner commented 2 years ago

@aseaudi After the log snippet that I posted above, swss/orchagent keeps running in my case and the tunnel is working (except for the ARP issue). See the full log in the techsupport dump for details.

show vxlan remotevtep shows oper_down for me, unless I apply the PR that I linked above - in that case, it correctly shows oper_up instead with no actual change to the tunnel behavior.

bridge fdb show br Bridge shows

b0:26:28:35:0e:01 dev Ethernet128 vlan 1000 master Bridge 
33:33:00:00:00:01 dev Ethernet128 self permanent
33:33:00:00:00:02 dev Ethernet128 self permanent
01:00:5e:00:00:01 dev Ethernet128 self permanent
33:33:ff:97:21:ce dev Ethernet128 self permanent
33:33:ff:00:00:00 dev Ethernet128 self permanent
01:80:c2:00:00:0e dev Ethernet128 self permanent
01:80:c2:00:00:03 dev Ethernet128 self permanent
01:80:c2:00:00:00 dev Ethernet128 self permanent
33:33:00:00:00:01 dev Bridge self permanent
33:33:00:00:00:02 dev Bridge self permanent
01:00:5e:00:00:01 dev Bridge self permanent
33:33:ff:4f:8e:84 dev Bridge self permanent
33:33:ff:00:00:00 dev Bridge self permanent
01:80:c2:00:00:21 dev Bridge self permanent
33:33:ff:97:21:ce dev Bridge self permanent
0c:48:c6:97:21:ce dev Bridge vlan 1000 master Bridge permanent
0c:48:c6:97:21:ce dev Bridge master Bridge permanent
7e:f9:d1:ab:97:1d dev dummy vlan 1 master Bridge permanent
7e:f9:d1:ab:97:1d dev dummy master Bridge permanent
33:33:00:00:00:01 dev dummy self permanent
b0:26:28:35:0e:00 dev vtep-1000 vlan 1000 extern_learn master Bridge 
00:00:00:00:00:00 dev vtep-1000 dst 2.2.2.2 self permanent
b0:26:28:35:0e:00 dev vtep-1000 dst 2.2.2.2 self extern_learn 
zhangyanzhao commented 2 years ago

Adam will find someone in BRCM to take a look. Thanks.

aseaudi commented 2 years ago

I was troubleshooting the same issue on my edgecore as8535-54x with sonic 202012, and i noticed that the arp packet is encapsulated in a vxlan packet with ttl=0 and was dropped by the next switch en route to the end vtep.

So, the ARP was dropped by the intermediate switch.

I don't know if this is normal, or if this is something configurable in the sonic.

16:32:52.234085 IP (tos 0x0, id 2994, offset 0, flags [none], proto UDP (17), length 96)
    [4.4.4.4](http://4.4.4.4/).61446 > 2.2.2.2.4789: [no cksum] VXLAN, flags [I] (0x08), vni 50
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.50.18 tell 192.168.50.11, length 46
16:32:52.234183 IP (tos 0xc0, ttl 64, id 16733, offset 0, flags [none], proto ICMP (1), length 124)
    10.3.4.3 > 4.4.4.4: ICMP time exceeded in-transit, length 104
    IP (tos 0x0, id 2994, offset 0, flags [none], proto UDP (17), length 96)
    4.4.4.4.61446 > 2.2.2.2.4789: [no cksum] VXLAN, flags [I] (0x08), vni 50
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.50.18 tell 192.168.50.11, length 46

Ipv6 is encapsulate in VXLAN with TTL = 64

21:40:56.806808 IP (tos 0x0, ttl 64, id 61067, offset 0, flags [none], proto UDP (17), length 122)
    4.4.4.4.40641 > 2.2.2.2.4789: [udp sum ok] VXLAN, flags [I] (0x08), vni 50
IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) 2001::1 > ff02::1:ff00:2: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has 2001::2
      source link-address option (1), length 8 (1): f8:8e:a1:e0:72:11
        0x0000:  f88e a1e0 7211
21:40:57.830709 IP (tos 0x0, ttl 64, id 61141, offset 0, flags [none], proto UDP (17), length 122)
    4.4.4.4.40641 > 2.2.2.2.4789: [udp sum ok] VXLAN, flags [I] (0x08), vni 50
IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) 2001::1 > ff02::1:ff00:2: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has 2001::2
      source link-address option (1), length 8 (1): f8:8e:a1:e0:72:11
        0x0000:  f88e a1e0 7211
bluecmd commented 2 years ago

We too had issues getting ARP to work a while back but gave up debugging it. See https://github.com/kamelnetworks/sonic/issues/9 for our own notes. Basically IPv4 w/ static ARP and IPv6 worked fine, ARP did not. We assumed it was something to do with ARP suppression at the time, but that was just a hunch.

aseaudi commented 2 years ago

I changed the VXLAN tunnel attribute in orchagnet from the default UNIFROM_MODEL to PIPE_MODEL with TTL = 64, and now ARP and Ping is working over the P2P Vxlan Tunnel.

        attr.id = SAI_TUNNEL_ATTR_ENCAP_TTL_MODE;
        attr.value.s32 = SAI_TUNNEL_TTL_MODE_PIPE_MODEL;
        tunnel_attrs.push_back(attr);

        attr.id = SAI_TUNNEL_ATTR_ENCAP_TTL_VAL;
        attr.value.u8 = 64;
        tunnel_attrs.push_back(attr);
lukasstockner commented 2 years ago

@aseaudi Fantastic find, thanks! I can confirm that that change makes it work. Looks like the code already supports specifying an encap_ttl, but all callers just leave it at zero.

jelmeronline commented 1 year ago

This issue does still exist, is there a way to define encap_ttl somewhere for P2P tunnels? Or is altering the source still the way to go? I'm experiencing it on two Dell 5248 with TD3 ASIC.

hanifrafif commented 7 months ago

I changed the VXLAN tunnel attribute in orchagnet from the default UNIFROM_MODEL to PIPE_MODEL with TTL = 64, and now ARP and Ping is working over the P2P Vxlan Tunnel.

        attr.id = SAI_TUNNEL_ATTR_ENCAP_TTL_MODE;
        attr.value.s32 = SAI_TUNNEL_TTL_MODE_PIPE_MODEL;
        tunnel_attrs.push_back(attr);

        attr.id = SAI_TUNNEL_ATTR_ENCAP_TTL_VAL;
        attr.value.u8 = 64;
        tunnel_attrs.push_back(attr);

Sorry, where is the directory to change ttl? thanks

jelmeronline commented 7 months ago

sonic-swss>orchagent>vxlanorch.cpp in the source before compiling.