sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
739 stars 1.43k forks source link

[lldp] lldp not works after teamd is restarted #6164

Closed bingwang-ms closed 1 year ago

bingwang-ms commented 3 years ago

Description The lldp is not to recover after teamd is restarted, and WARNING log keeps flooding. Related issue https://github.com/Azure/sonic-buildimage/issues/5971

......
Dec  7 12:13:50.004482 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet76: No such device or address
Dec  7 12:13:50.004482 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet76: No such device or address
Dec  7 12:13:50.052813 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet80: No such device or address
Dec  7 12:13:50.052813 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet80: No such device or address
Dec  7 12:13:50.053998 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet80: No such device or address
Dec  7 12:13:50.053998 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet80: No such device or address
Dec  7 12:13:50.073774 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet84: No such device or address
Dec  7 12:13:50.073774 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet84: No such device or address
Dec  7 12:13:50.080420 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet84: No such device or address
Dec  7 12:13:50.080420 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet84: No such device or address
Dec  7 12:13:50.134232 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet88: No such device or address
Dec  7 12:13:50.134232 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet88: No such device or address
Dec  7 12:13:50.134232 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet88: No such device or address
Dec  7 12:13:50.134232 str-dx010-acs-4 INFO lldp#/supervisord: lldpd 2020-12-07T12:13:50 [WARN/lldp] unable to send packet on real device for Ethernet88: No such device or address
Dec  7 12:13:50.154296 str-dx010-acs-4 WARNING lldp#lldpd[29]: unable to send packet on real device for Ethernet92: No such device or address
......

Steps to reproduce the issue:

  1. Run config reload to initialize DUT
  2. Kill a critical process in teamd container, say teammgrd.
  3. Contailer teamd will be restarted, and WARNING is flooding.

Describe the results you received: lldp is not able to recover after teamd is restarted.

Describe the results you expected: lldp should be recovered after teamd is restarted.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
SONiC Software Version: SONiC.HEAD.101-3e717210
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: 3e717210
Build date: Sun Dec  6 19:37:50 UTC 2020
Built by: johnar@jenkins-worker-22

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
ASIC Count: 1
Serial Number: DX010F2B118711MS100007
Uptime: 02:34:37 up  6:19,  1 user,  load average: 3.10, 3.37, 3.35

Docker images:
REPOSITORY                    TAG                 IMAGE ID            SIZE
docker-sonic-telemetry        HEAD.101-3e717210   cd6a12cca9ee        518MB
docker-sonic-telemetry        latest              cd6a12cca9ee        518MB
docker-snmp                   HEAD.101-3e717210   8daa11cbf19b        484MB
docker-snmp                   latest              8daa11cbf19b        484MB
docker-teamd                  HEAD.101-3e717210   e442c7202130        491MB
docker-teamd                  latest              e442c7202130        491MB
docker-sonic-mgmt-framework   HEAD.101-3e717210   2a3949717681        606MB
docker-sonic-mgmt-framework   latest              2a3949717681        606MB
docker-router-advertiser      HEAD.101-3e717210   2ded3ce86035        448MB
docker-router-advertiser      latest              2ded3ce86035        448MB
docker-platform-monitor       HEAD.101-3e717210   59987ecb8c7e        572MB
docker-platform-monitor       latest              59987ecb8c7e        572MB
docker-lldp                   HEAD.101-3e717210   079b876a59a1        488MB
docker-lldp                   latest              079b876a59a1        488MB
docker-dhcp-relay             HEAD.101-3e717210   c62a5faa21f1        455MB
docker-dhcp-relay             latest              c62a5faa21f1        455MB
docker-database               HEAD.101-3e717210   035f87ff4fcc        448MB
docker-database               latest              035f87ff4fcc        448MB
docker-orchagent              HEAD.101-3e717210   433129065bf0        505MB
docker-orchagent              latest              433129065bf0        505MB
docker-nat                    HEAD.101-3e717210   cc2ed184bfb5        494MB
docker-nat                    latest              cc2ed184bfb5        494MB
docker-fpm-frr                HEAD.101-3e717210   c13f136a82d1        507MB
docker-fpm-frr                latest              c13f136a82d1        507MB
docker-sflow                  HEAD.101-3e717210   28b7bad1411c        492MB
docker-sflow                  latest              28b7bad1411c        492MB
docker-syncd-brcm             HEAD.101-3e717210   a3c3d4b1837b        542MB
docker-syncd-brcm             latest              a3c3d4b1837b        542M

syslog.299.gz Attach debug file sudo generate_dump:

```
(paste your output here)
```
chitra-raghavan commented 3 years ago

This issue is seen after warm-reboot in t1 topo.

oot@sonic-z9332-10429:~#
root@sonic-z9332-10429:~# show lldp table
Capability codes: (R) Router, (B) Bridge, (O) Other
LocalPort    RemoteDevice       RemotePortID    Capability    RemotePortDescr
-----------  -----------------  --------------  ------------  -----------------
Ethernet0    ARISTA01T2         Ethernet1       BR
Ethernet8    ARISTA02T2         Ethernet1       BR
Ethernet16   ARISTA03T2         Ethernet1       BR
Ethernet24   ARISTA04T2         Ethernet1       BR
Ethernet32   ARISTA05T2         Ethernet1       BR
Ethernet40   ARISTA06T2         Ethernet1       BR
Ethernet48   ARISTA07T2         Ethernet1       BR
Ethernet56   ARISTA08T2         Ethernet1       BR
Ethernet64   ARISTA09T2         Ethernet1       BR
Ethernet72   ARISTA10T2         Ethernet1       BR
Ethernet80   ARISTA11T2         Ethernet1       BR
Ethernet88   ARISTA12T2         Ethernet1       BR
Ethernet96   ARISTA13T2         Ethernet1       BR
Ethernet104  ARISTA14T2         Ethernet1       BR
Ethernet112  ARISTA15T2         Ethernet1       BR
Ethernet120  ARISTA16T2         Ethernet1       BR
Ethernet128  ARISTA01T0         Ethernet1       BR
Ethernet136  ARISTA02T0         Ethernet1       BR
Ethernet144  ARISTA03T0         Ethernet1       BR
Ethernet152  ARISTA04T0         Ethernet1       BR
Ethernet160  ARISTA05T0         Ethernet1       BR
Ethernet168  ARISTA06T0         Ethernet1       BR
Ethernet176  ARISTA07T0         Ethernet1       BR
Ethernet184  ARISTA08T0         Ethernet1       BR
Ethernet192  ARISTA09T0         Ethernet1       BR
Ethernet200  ARISTA10T0         Ethernet1       BR
Ethernet208  ARISTA11T0         Ethernet1       BR
Ethernet216  ARISTA12T0         Ethernet1       BR
Ethernet224  ARISTA13T0         Ethernet1       BR
Ethernet232  ARISTA14T0         Ethernet1       BR
Ethernet240  ARISTA15T0         Ethernet1       BR
Ethernet248  ARISTA16T0         Ethernet1       BR
eth0         swlab2-maa-tor-I7  ethernet1/1/33  OBR           ethernet1/1/33
--------------------------------------------------
Total entries displayed:  33
root@sonic-z9332-10429:~#
root@sonic-z9332-10429:~# warm-reboot

Error response from daemon: Cannot kill container: nat: No such container: nat

Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Warning: Stopping mgmt-framework.service, but it can still be activated by:
  mgmt-framework.timer
Failed to arm Watchdog for 180 seconds
[  406.469393] kexec_core: Starting new kernel
[    0.312478] Base address is zero, assuming no IPMI interface
[    4.159082] rc.local[458]: + grep build_version
[    4.221421] rc.local[458]: + cat /etc/sonic/sonic_version.yml
[    4.297177] rc.local[458]: + sed -e s/build_version: //g;s/'//g
[    4.375961] rc.local[458]: + SONIC_VERSION=HEAD.195-6efc0a88
[    4.452153] rc.local[458]: + FIRST_BOOT_FILE=/host/image-HEAD.195-6efc0a88/platform/firsttime
[    4.560210] rc.local[458]: + SONIC_CONFIG_DIR=/host/image-HEAD.195-6efc0a88/sonic-config
[    4.664261] rc.local[458]: + SONIC_ENV_FILE=/host/image-HEAD.195-6efc0a88/sonic-config/sonic-environment
[    4.784197] rc.local[458]: + [ -d /host/image-HEAD.195-6efc0a88/sonic-config -a -f /host/image-HEAD.195-6efc0a88/sonic-config/sonic-environment ]
[    4.944178] rc.local[458]: + logger SONiC version HEAD.195-6efc0a88 starting up...
[    5.036197] rc.local[458]: + grub_installation_needed=
[    5.100149] rc.local[458]: + [ ! -e /host/machine.conf ]
[    5.172333] rc.local[458]: + migrate_nos_configuration
[    5.244324] rc.local[458]: + rm -rf /host/migration
[    5.308184] rc.local[458]: + mkdir -p /host/migration
[    5.405930] rc.local[458]: + cat /proc/cmdline
[    5.466436] kdump-tools[468]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-dellemc_z9332f_d1508-r0: not found
[    5.708547] rc.local[458]: + set -- BOOT_IMAGE=/image-HEAD.195-6efc0a88/boot/vmlinuz-4.19.0-9-2-amd64 root=UUID=03bbc2a5-6abe-4184-9d5e-bd3e0fb928a0 rw console=tty0 console=ttyS0,9600n8 quiet net.ifnames=0 biosdevname=0 loop=image-HEAD.195-6efc0a88/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 SONIC_BOOT_TYPE=warm
[    6.100275] rc.local[458]: + [ -n  ]
[    6.144177] rc.local[458]: + . /host/machine.conf
[    6.204361] rc.local[458]: + onie_arch=x86_64
[    6.264222] rc.local[458]: + onie_bin=
[    6.320175] rc.local[458]: + onie_boot_reason=install
[    6.392353] rc.local[458]: + onie_build_date=2020-11-06T15:51-0500
[    6.468205] rc.local[458]: + onie_build_machine=dellemc_z9332f_d1508
[    6.556245] rc.local[458]: + onie_build_platform=x86_64-dellemc_z9332f_d1508-r0
[    6.648240] rc.local[458]: + onie_cli_static_parms=i
[    6.708257] rc.local[458]: + onie_cli_static_url=sonic_upgrade.bin
[    6.792248] rc.local[458]: + onie_config_version=1
[    6.852340] rc.local[458]: + onie_dev=/dev/sda2
[    6.912195] rc.local[458]: + onie_exec_url=sonic_upgrade.bin
[    6.988447] rc.local[458]: + onie_firmware=auto
[    7.048147] rc.local[458]: + onie_grub_image_name=grubx64.efi
[    7.124307] rc.local[458]: + onie_initrd_tmp=/
[    7.184580] rc.local[458]: + onie_installer=/var/tmp/installer
[    7.260780] rc.local[458]: + onie_kernel_version=4.9.95
[    7.332249] rc.local[458]: + onie_machine=dellemc_z9332f_d1508
[    7.408211] rc.local[458]: + onie_machine_rev=0
[    7.472255] rc.local[458]: + onie_partition_type=gpt
[    7.532424] rc.local[458]: + onie_platform=x86_64-dellemc_z9332f_d1508-r0
[    7.620354] rc.local[458]: + onie_root_dir=/mnt/onie-boot/onie
[    7.696244] rc.local[458]: + onie_skip_ethmgmt_macs=no
[    7.768256] rc.local[458]: + onie_switch_asic=bcm
[    7.832215] rc.local[458]: + onie_uefi_arch=x64
[    7.892207] rc.local[458]: + onie_uefi_boot_loader=grubx64.efi
[    7.968376] rc.local[458]: + onie_vendor_id=12244
[    8.028309] rc.local[458]: + onie_version=2020.11.06.0.0.3
[    8.100324] rc.local[458]: + program_console_speed
[    8.166282] rc.local[458]: + cat /proc/cmdline
[    8.226683] rc.local[458]: + cut -d , -f2
[    8.289011] rc.local[458]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[    8.365924] rc.local[458]: + speed=9600
[    8.420207] rc.local[458]: + [ -z 9600 ]
[    8.476227] rc.local[458]: + CONSOLE_SPEED=9600
[    8.536194] rc.local[458]: + sed -i s|\-\-keep\-baud .* %I| 9600 %I|g /lib/systemd/system/serial-getty@.service
[    8.664392] rc.local[458]: + systemctl daemon-reload
[    8.724328] rc.local[458]: + [ -f /host/image-HEAD.195-6efc0a88/platform/firsttime ]
[    8.828259] rc.local[458]: + [ -f /var/log/fsck.log.gz ]
[    8.901837] rc.local[458]: + logger -t FSCK
[    8.961516] rc.local[458]: + gunzip -d -c /var/log/fsck.log.gz
[    9.036938] rc.local[458]: + rm -f /var/log/fsck.log.gz
[    9.108264] rc.local[458]: + exit 0

Debian GNU/Linux 10 sonic-z9332-10429 ttyS0

sonic-z9332-10429 login: admin
Password:
Last login: Thu Dec 31 10:30:47 UTC 2020 on ttyS0
Linux sonic-z9332-10429 4.19.0-9-2-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64
You are on
  ____   ___  _   _ _  ____
 / ___| / _ \| \ | (_)/ ___|
 \___ \| | | |  \| | | |
  ___) | |_| | |\  | | |___
 |____/ \___/|_| \_|_|\____|

-- Software for Open Networking in the Cloud --

Unauthorized access and/or use are prohibited.
All access and/or use are subject to monitoring.

Help:    http://azure.github.io/SONiC/

admin@sonic-z9332-10429:~$ sudo -i

root@sonic-z9332-10429:~#
root@sonic-z9332-10429:~# systemctl is-active warmboot-finalizer.service
inactive
root@sonic-z9332-10429:~#
root@sonic-z9332-10429:~#
root@sonic-z9332-10429:~# show lldp table
doCapability codes: (R) Router, (B) Bridge, (O) Other
LocalPort    RemoteDevice       RemotePortID    Capability    RemotePortDescr
-----------  -----------------  --------------  ------------  -----------------
eth0         swlab2-maa-tor-I7  ethernet1/1/33  OBR           ethernet1/1/33
--------------------------------------------------
Total entries displayed:  1
ckeroot@sonic-z9332-10429:~# docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
883d10a966e4        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   6 hours ago         Up 5 minutes                            telemetry
a4e7699e4d77        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   6 hours ago         Up 5 minutes                            mgmt-framework
41491a6c57e8        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   6 hours ago         Up 9 minutes                            radv
61f7543e39f4        docker-lldp:latest                   "/usr/bin/docker-lld…"   6 hours ago         Up 9 minutes                            lldp
3825c59da046        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   6 hours ago         Up 9 minutes                            dhcp_relay
c2957679ea85        docker-syncd-brcm:latest             "/usr/local/bin/supe…"   6 hours ago         Up 9 minutes                            syncd
b851b3f6fdc4        docker-teamd:latest                  "/usr/local/bin/supe…"   6 hours ago         Up 9 minutes                            teamd
7e198491dc29        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   6 hours ago         Up 9 minutes                            bgp
e5dd8a3b871e        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   6 hours ago         Up 9 minutes                            pmon
6bb9a43106cc        docker-database:latest               "/usr/local/bin/dock…"   6 hours ago         Up 9 minutes                            database
root@sonic-z9332-10429:~#
yxieca commented 3 years ago

@abdosi are you the right person for this issue? Can you take a look and comment?

abdosi commented 3 years ago

@bingwang-ms is this issue still valid.

ZhaohuiS commented 1 year ago

@bingwang-ms Do you encounter this issue again? The issue should be fixed in The issue should be fixed in https://github.com/sonic-net/sonic-buildimage/pull/9519. I close it for now, please let me know if the issue happens again.