sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

MGMT interface flap: "kernel eth0: igb: eth0 NIC Link is Down" #7310

Open vaibhavhd opened 3 years ago

vaibhavhd commented 3 years ago

Description

MGMT interface eth0 flaps abruptly. Netlink and kernel messages in syslog confirm that the interface went down, and came back up.

eth0: igb: eth0 NIC Link is Down

This issue was seen while running test_vlan. However, the testcase does not perform this action, and this issue further led to loss of connectivity to the DUT, and the test failed.

Steps to reproduce the issue:

  1. The issues was seen while running test_vlan (but can be random).
  2. Seen once, reproducibility may be difficult (more tests will confirm this).

Describe the results you received:

ETH0 Flapping:

Apr  9 21:05:38.103350 str-s6100-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:0 addr:34:17:eb:49:61:00 ifindex:2 master:0
Apr  9 21:05:38.106121 str-s6100-acs-4 INFO kernel: [10601.214044] igb 0000:00:14.0 eth0: igb: eth0 NIC Link is Down
Apr  9 21:05:38.117734 str-s6100-acs-4 NOTICE swss#orchagent: :- removeRouterIntfs: Router interface is still referenced
Apr  9 21:05:38.119529 str-s6100-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 10.0.0.13 on PortChannel0004
Apr  9 21:05:38.126648 str-s6100-acs-4 NOTICE swss#orchagent: :- removeNeighbor: Removed neighbor 52:54:00:3a:93:fc on PortChannel0004
Apr  9 21:05:38.127072 str-s6100-acs-4 NOTICE swss#orchagent: :- removeRouterIntfs: Router interface is still referenced
Apr  9 21:05:39.548557 str-s6100-acs-4 INFO syncd#syncd: [none] SAI_API_FDB:_brcm_sai_fdb_event_cb:132 fdbEvent: 1 for mac 24-8A-07-4C-F5-08 vid:0xc8, port:0x2c lagid:0x0 flags:0x10440 flags2:0x0 lag:false station flags 0x0
Apr  9 21:05:40.027321 str-s6100-acs-4 INFO ntpd[3927]: Deleting interface #1 eth0, 10.64.246.226#123, interface stats: received=60, sent=60, dropped=0, active_time=1648 secs
Apr  9 21:05:40.027683 str-s6100-acs-4 INFO ntpd[3927]: 10.20.8.130 local addr 10.64.246.226 -> <null>
Apr  9 21:05:40.027842 str-s6100-acs-4 INFO ntpd[3927]: 10.20.8.129 local addr 10.64.246.226 -> <null>
Apr  9 21:05:40.028012 str-s6100-acs-4 INFO ntpd[3927]: Deleting interface #3 eth0, fc00:2::32#123, interface stats: received=0, sent=0, dropped=0, active_time=1648 secs

Apr  9 21:11:47.346586 str-s6100-acs-4 INFO kernel: [10970.476784] igb 0000:00:14.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Apr  9 21:11:47.459423 str-s6100-acs-4 NOTICE swss#orchagent: message repeated 438 times: [ :- removeRouterIntfs: Router interface is still referenced]
Apr  9 21:11:47.459423 str-s6100-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:1 addr:34:17:eb:49:61:00 ifindex:2 master:0
Apr  9 21:11:47.470924 str-s6100-acs-4 NOTICE swss#orchagent: :- removeRouterIntfs: Router interface is still referenced
Apr  9 21:11:49.027441 str-s6100-acs-4 INFO ntpd[3927]: Listen normally on 4 eth0 10.64.246.226:123
Apr  9 21:11:49.027842 str-s6100-acs-4 INFO ntpd[3927]: Listen normally on 5 eth0 [fc00:2::32]:123

Describe the results you expected:

Output of show version:

SONiC Software Version: SONiC.HEAD.601-a7c55a1d - HwSku: Force10-S6100 - Distribution: Debian 10.9 - Kernel: 4.19.0-12-2-amd64

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Dump file too big to be attached here. Adding syslog for quick reference. syslog.1.gz

daall commented 3 years ago

waiting to see if this is reproducible or a one-off mgmt fabric link flap