sonic-net / sonic-swss

SONiC Switch State Service (SwSS)
https://azure.github.io/SONiC
Other
171 stars 519 forks source link

[warm-reboot] lost vlanif in the kernel after system warm-reboot #1270

Open tzack000 opened 4 years ago

tzack000 commented 4 years ago
<1> Before warm-reboot ``` TD3-20:~$ ifconfig Vlan12 Vlan12: flags=4163 mtu 1500 inet 12.1.1.1 netmask 255.255.255.0 broadcast 12.1.1.255 inet6 12::1 prefixlen 58 scopeid 0x0 inet6 fe80::8205:88ff:fe74:7717 prefixlen 64 scopeid 0x20 inet6 12:2::1 prefixlen 58 scopeid 0x0 ether 80:05:88:74:77:17 txqueuelen 1000 (Ethernet) RX packets 879345 bytes 92560524 (88.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 97901 bytes 7803250 (7.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ``` <2> excute warm-reboot ``` TD3-20:~$ sudo warm-reboot -v Thu Apr 23 19:39:23 CST 2020 Pausing orchagent ... RESTARTCHECK succeeded Thu Apr 23 19:39:23 CST 2020 Stopping bgp ... Thu Apr 23 19:39:23 CST 2020 Stopped bgp ... swss Thu Apr 23 19:39:28 CST 2020 Initialize pre-shutdown ... (integer) 1 (integer) 1 Thu Apr 23 19:39:28 CST 2020 Requesting pre-shutdown ... syncd requested ASIC PRE-SHUTDOWN shutdown Thu Apr 23 19:39:28 CST 2020 Waiting for pre-shutdown ... Thu Apr 23 19:39:32 CST 2020 Pre-shutdown succeeded ... Thu Apr 23 19:39:32 CST 2020 Backing up database ... (nil) OK OK OK OK OK OK OK OK Thu Apr 23 19:39:36 CST 2020 Stopping teamd ... Thu Apr 23 19:39:36 CST 2020 Stopped teamd ... Thu Apr 23 19:39:36 CST 2020 Stopping syncd ... Thu Apr 23 19:39:48 CST 2020 Stopped syncd ... Thu Apr 23 19:39:50 CST 2020 Rebooting with /sbin/reboot to SONiC-OS-201811.0-dirty-20200422105944 ... ``` <3> After warm-reboot, vlanif is lost, got some error log for vlanmgrd. ``` TD3-20:~$ ifconfig Vlan12 Vlan12: error fetching interface information: Device not found <14> Apr 23 19:41:20.046839 TD3-20 INFO CQ-YX-TLA25G-ZACKTANG-TD3-20 supervisord: vlanmgrd Error: argument "Bridge" is wrong: Device does not exist ```
tzack000 commented 4 years ago

In this PR https://github.com/Azure/sonic-swss/pull/550, we did vlan-filtering for swss warm-restart, but it looks like if we don't reset vlan aware bridge upon warm restart, kernel won't be able to restore vlanif after system warm-reboot.