sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
723 stars 1.38k forks source link

Impossible to do ping between IPs in different VRFs in scenario with physical loopback #5947

Open ppikh opened 3 years ago

ppikh commented 3 years ago

Description

Impossible to do ping between IPs in different VRFs in scenario with physical loopback

Test setup SONiC device with cable connected from port Ethernet4 to port Ethernet8

Steps to reproduce the issue:

  1. Do on SONiC device next commands: sudo config vrf add Vrf_custom sudo config interface vrf bind Ethernet8 Vrf_custom sudo config interface ip add Ethernet8 31.1.1.2/24 sudo config vlan add 31 sudo config vlan member add --untagged 31 Ethernet4 sudo config interface ip add Vlan31 31.1.1.1/24
  2. Try to ping from SONiC device IP address 31.1.1.2

Describe the results you received: In step 2 ping does not work PING 31.1.1.2 (31.1.1.2) 56(84) bytes of data. From 31.1.1.1 icmp_seq=1 Destination Host Unreachable From 31.1.1.1 icmp_seq=2 Destination Host Unreachable

Describe the results you expected: In step 2 ping passed, possible to do ping from Default VRF to some Custom VRF via physical link.

Additional information you deem important (e.g. issue happens only occasionally): issue reproduces 100% show ip interfaces Interface Master IPv4 address/mask Admin/Oper BGP Neighbor Neighbor IP


Ethernet8 Vrf_custom 31.1.1.2/24 up/up N/A N/A Vlan31 31.1.1.1/24 up/up N/A N/A

show ip route vrf all Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric,

  • selected route, * - FIB route, q - queued route, r - rejected route

VRF Vrf_custom: C>* 31.1.1.0/24 is directly connected, Ethernet8, 00:05:35 Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric,

  • selected route, * - FIB route, q - queued route, r - rejected route

K> 0.0.0.0/0 [0/0] via XXXXXXXX, eth0, 00:08:55 C> XXXXXXXX/22 is directly connected, eth0, 00:08:55 C>* 31.1.1.0/24 is directly connected, Vlan31, 00:05:34

show version SONiC Software Version: SONiC.HEAD.25-a79c3c21 Distribution: Debian 9.13 Kernel: 4.9.0-11-2-amd64 Build commit: a79c3c21 Build date: Mon Nov 16 09:06:39 UTC 2020 Built by: XXXXXXXXX

Platform: x86_64-mlnx_msn3700-r0 HwSKU: ACS-MSN3700 ASIC: mellanox Serial Number: XXXXXXXXX Uptime: 09:24:06 up 12:05, 1 user, load average: 2.59, 1.83, 1.42

Docker images: REPOSITORY TAG IMAGE ID SIZE docker-syncd-mlnx HEAD.25-a79c3c21 0636ef41fcee 398MB docker-syncd-mlnx latest 0636ef41fcee 398MB docker-sonic-telemetry HEAD.25-a79c3c21 e2490db8a213 353MB docker-sonic-telemetry latest e2490db8a213 353MB docker-router-advertiser HEAD.25-a79c3c21 0108fdf37d69 290MB docker-router-advertiser latest 0108fdf37d69 290MB docker-platform-monitor HEAD.25-a79c3c21 66302be6d41d 664MB docker-platform-monitor latest 66302be6d41d 664MB docker-fpm-frr HEAD.25-a79c3c21 7f443adb2c18 335MB docker-fpm-frr latest 7f443adb2c18 335MB docker-teamd HEAD.25-a79c3c21 3baca95f9e20 315MB docker-teamd latest 3baca95f9e20 315MB docker-lldp-sv2 HEAD.25-a79c3c21 aa0fd8009ce0 312MB docker-lldp-sv2 latest aa0fd8009ce0 312MB docker-dhcp-relay HEAD.25-a79c3c21 4d583d73e313 299MB docker-dhcp-relay latest 4d583d73e313 299MB docker-database HEAD.25-a79c3c21 781d0050177b 289MB docker-database latest 781d0050177b 289MB docker-snmp-sv2 HEAD.25-a79c3c21 e168230b1884 348MB docker-snmp-sv2 latest e168230b1884 348MB docker-orchagent HEAD.25-a79c3c21 12b68b8483fc 333MB docker-orchagent latest 12b68b8483fc 333MB docker-sflow HEAD.25-a79c3c21 4d7b05f3b944 315MB docker-sflow latest 4d7b05f3b944 315MB docker-nat HEAD.25-a79c3c21 8330dbb84ded 316MB docker-nat latest 8330dbb84ded 316MB docker-sonic-mgmt-framework HEAD.25-a79c3c21 1e6a1005f7af 427MB docker-sonic-mgmt-framework latest 1e6a1005f7af 427MB

anshuv-mfst commented 3 years ago

Issue seen by manual config/testing. Can you please try adding interface IP to Eth4 instead of Vlan in the default VRF and try ping again.

(Can be enhancement request and not an issue)

ppikh commented 3 years ago

Hi @anshuv-mfst

I tried with physical interfaces. it still does not work. I think maybe issue related to similar MAC addresses on L3 ifaces?

`root@X:/home/admin# ifconfig Ethernet4 Ethernet4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9100 inet 31.1.1.1 netmask 255.255.255.0 broadcast 31.1.1.255 inet6 fe80::1e34:daff:fe16:6800 prefixlen 64 scopeid 0x20 ether 1c:34:da:16:68:00 txqueuelen 1000 (Ethernet) RX packets 1088 bytes 239188 (233.5 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1095 bytes 272516 (266.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

root@X:/home/admin# ifconfig Ethernet8 Ethernet8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9100 inet 31.1.1.2 netmask 255.255.255.0 broadcast 31.1.1.255 inet6 fe80::1e34:daff:fe16:6800 prefixlen 64 scopeid 0x20 ether 1c:34:da:16:68:00 txqueuelen 1000 (Ethernet) RX packets 1091 bytes 239326 (233.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1099 bytes 273096 (266.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 `

vazhel commented 3 years ago

Hi @ppikh

Tcpdump shows ARP requests. It looks like one ARP per one ICMP. I checked with different configurations and got the same results. I agree with you that issue can be related to similar MAC addresses.

There are my configurations below

--- 1) sudo config vrf add Vrf_custom sudo config interface vrf bind Ethernet36 Vrf_custom sudo config interface ip add Ethernet36 192.168.10.36/24 sudo config vlan add 31 sudo config vlan member add --untagged 31 Ethernet48 sudo config interface ip add Vlan31 192.168.10.48/24

--- 2) sudo config vrf add Vrf_custom sudo config interface vrf bind Ethernet36 Vrf_custom sudo config interface ip add Ethernet36 192.168.10.36/24 sudo config vrf add Vrf_custom2 sudo config interface vrf bind Ethernet48 Vrf_custom2 sudo config interface ip add Ethernet48 192.168.10.48/24

--- 3) sudo config vlan add 36 sudo config vlan member add --untagged 36 Ethernet36 sudo config interface ip add Vlan36 192.168.10.36/24 sudo config vlan add 48 sudo config vlan member add --untagged 48 Ethernet48 sudo config interface ip add Vlan48 192.168.10.48/24

liat-grozovik commented 3 years ago

@lguohan can you please confirm if the above observation is by design or this should be changing the MAC assignment?

liat-grozovik commented 3 years ago

@lguohan kindly reminder.

bluecmd commented 3 years ago

Hello.

This hit me today as well. I can confirm that this was due to MAC conflict between the interfaces.

I configured my switch with its management port (eth0) and one normal port (tenGigE1/34) to the same subnet (172.18.0.0/24). I am using a Vlan110 interface that has been assigned to a Vrf.

Interfaces:

I try to ping to and from a Linux box, 172.18.0.251. The traffic flip/flops between the two interfaces. From the Linux box:

blackbox:~$ ip neigh | grep 3c:2c:30:78
172.18.0.154 dev eth1.110 lladdr 3c:2c:30:78:5b:80 REACHABLE
172.18.0.250 dev eth1.110 lladdr 3c:2c:30:78:5b:80 REACHABLE

From the SONiC switch:

admin@localhost:~$ ip neigh show vrf mgmt
172.18.0.251 dev eth0 lladdr 00:0d:b9:52:a7:99 REACHABLE
172.18.0.250 dev eth0 lladdr 3c:2c:30:78:5b:80 STALE
admin@localhost:~$ ip neigh show vrf VrfProd
172.18.0.154 dev Vlan110 lladdr 3c:2c:30:78:5b:80 REACHABLE
172.18.0.251 dev Vlan110 lladdr 00:0d:b9:52:a7:99 REACHABLE

I also see the following in dmesg on the SONiC switch:

[  414.415372] Bridge: received packet on Ethernet129 with own address as source address (addr:3c:2c:30:78:5b:80, vlan:110)
[  534.759376] Bridge: received packet on Ethernet129 with own address as source address (addr:3c:2c:30:78:5b:80, vlan:110)
[  611.767204] Bridge: received packet on Ethernet129 with own address as source address (addr:3c:2c:30:78:5b:80, vlan:110)
[  653.603704] Bridge: received packet on Ethernet129 with own address as source address (addr:3c:2c:30:78:5b:80, vlan:110)

Dumps of interesting configuration:

bluecmd@localhost:/host$ show ip int
Interface    Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
-----------  --------  -------------------  ------------  --------------  -------------
Vlan110      VrfProd   172.18.0.250/24      up/up         N/A             N/A
docker0                240.127.1.1/24       up/down       N/A             N/A
eth0         mgmt      172.18.0.154/24      up/up         N/A             N/A
lo                     127.0.0.1/16         up/up         N/A             N/A
lo-m         mgmt      127.0.0.1/16         up/up         N/A             N/A
bluecmd@localhost:/host$ show vlan brief
+-----------+-----------------+-------------+----------------+-----------------------+-------------+
|   VLAN ID | IP Address      | Ports       | Port Tagging   | DHCP Helper Address   | Proxy ARP   |
+===========+=================+=============+================+=======================+=============+
|       110 | 172.18.0.250/24 | tenGigE1/34 | tagged         |                       | disabled    |
+-----------+-----------------+-------------+----------------+-----------------------+-------------+

My workaround: I set a custom MAC address on eth0.

$ sudo ip link set dev eth0 addr ee:6e:08:ef:xx:xx

The second I did this everything works as intended. The problem I am facing now is that I have to run this command every switch reboot, so it would be really appreciated if there was a way to make SONiC apply this configuration automatically.

bluecmd commented 3 years ago

@zhangyanzhao Could I be so bold to ask for a re-triage of this bug?

bluecmd commented 3 years ago

Since I found this sonic-swss PR I have managed to change MAC addresses on L3 interfaces (haven't tried the eth0 mgmt though) - so I am very close to being able to ping between VRFs.

    "INTERFACE": {
        "Ethernet92": {
            "vrf_name": "VrfA",
            "mac_addr": "76:c0:f8:4b:c8:04"
        },
        "Ethernet93": {
            "vrf_name": "VrfB",
            "mac_addr": "76:c0:f8:4b:c8:05"
        },
        "Ethernet92|192.168.216.11/24": {},
        "Ethernet93|192.168.216.13/24": {}
     }

There appears to be two issues that are relevant to this reported issue: