open-switch / opx-nas-l3

https://openswitch.net
1 stars 9 forks source link

Ping is not working between neighbors. #24

Closed waliulislam closed 5 years ago

waliulislam commented 6 years ago

64 L3 VLANs configured on top of a LAG of 16x interfaces in both switches (S6010 to Z9100 with 16x 10GbE interfaces (4x 40G interfaces fanned out)). Some of the interface pings do not work because ARP is incomplete or failed for those interfaces. ARP request is shown in kernel but does not seem to get transmitted to neighbor.

This issue is seen on the OPX 3.0.0 release candidate installer.

jeff-yin commented 6 years ago

Fixed by https://github.com/open-switch/opx-nas-daemon/pull/31

Tejaswi-Goel commented 6 years ago

Fix: Removing the rule - ebtables -A OUTPUT -p ARP --arp-op Request --nflog-group 100 -j DROP from dn_rules.sh This rule was added to improve the packet I/O performance by sending all the ARP requests into a NFLOG filter in kernel and to receive them in the packet I/O through socket for ingress pipeline injection (less ARP request packets) into HW for HW flooding so that we don’t have to handle too many ARP requests (one packet each VLAN member port) through tun/tap interfaces from kernel to packet I/O for egress pipeline transmission. Since this is affecting the performance and in some extreme cases affect the ARP resolution/Traffic forwarding issue, we have removed this rule.

atanu-mandal commented 6 years ago

Closing this.

jeff-yin commented 6 years ago

Reopening because removing the rule was a workaround, and the rule is needed there to improve performance. The root cause needs to be identified and a more viable fix needs to be implemented.

GarrickHe commented 6 years ago

retesting.

GarrickHe commented 5 years ago

Fix found and will be part of OPX3.1 release.

GarrickHe commented 5 years ago

Fix has been verified. Closing this.

jeff-yin commented 5 years ago

@GarrickHe since the code for this fix will be made available later on via code push, can you detail the fix here in the meantime?