sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
739 stars 1.43k forks source link

Installation failed on EdgeCore Wedge100bf-32qs #11061

Open odd22 opened 2 years ago

odd22 commented 2 years ago

Description

Attempt installing SONiC master or 202111 version on a brand new Edge Core Wedge100bf-32qs failed while it is working fine on the 32x version.

  1. After the first boot, the debian package sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb refused to be install due to a dependencies problem. It look for linux image version 4.19.x while installed linux image version is 5.10.x
  2. Syncd docker crashed immediately after starting or restarting leaving a log about library problem (but message is not clear enough to determine what's wrong)

Thus, the switch is not functioning correctly: Platform management show nothing and Tofino ASIC is not initialized

Steps to reproduce the issue:

  1. Install fresh master or 202111 release on a Edge Core Wedge100bf-32qs switch
  2. Look to console after the first boot or try to install manually the sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb from /host/image-xxx/platform/x86_64-accton_wedge100bf_32qs-r0 directory
  3. Perform a docker ps -a to see that syncd docker is not running and have a look to dmesg to see the error message

Describe the results you received:

sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb package is not installed with pmon docker inoperate syncd docker crashed

Describe the results you expected:

sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb package should be installed with all show platform xxx output as expected running syncd docker and show interfaces status reporting all live interfaces.

How to correct these bugs

1/ For the debian package, there is a wrong dependency in https://github.com/Azure/sonic-buildimage/blob/master/platform/barefoot/sonic-platform-modules-accton/debian/control line 11. It should be removed like for Accton Braodcom platform for example. It is also possible to force the installation of the package with dpkg -i --force-all sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb

2/ For syncd, there is a missing link in /opt/bfn/install/lib/platform. The /usr/bin/syncd_init_common.sh script looks into /etc/machine.conf to determine the exact platform to determine which library must be preloaded:

    252     export ONIE_PLATFORM=`grep onie_platform /etc/machine.conf | awk 'BEGIN { FS = "=" } ; { print $2 }'`
    253     export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/bfn/install/lib/platform/$ONIE_PLATFORM:/opt/bfn/install/lib:/opt/bfn/instal        l/lib/tofinopd/switch

Here, it found x86_64-accton_wedge100bf_32qs-r0 but only a link for x86_64-accton_wedge100bf_32x-r0 exist in /opt/bfn/install/lib/platform. Thus, LD_LIBRARY_PATH is not correctly setup causing syncd to crash as it not pre-loaded the platform library. To correct the problem, a link should be added like this: x86_64-accton_wedge100bf_32qs-r0 -> x86_64-accton_wedge100bf_65x-r0 Again, this is a debian packaging problem.

Once both manual actions done, the platform is working as expected.

zhangyanzhao commented 2 years ago

@odd22 can you please create a PR with your fix? Community can review. Thanks.

odd22 commented 2 years ago

@zhangyanzhao Yes of course, but only for the first one a.k.a. the sonic-platform-accton-wedge100bf-32qs_1.1_amd64.deb package. For the second, one, it is related to P4studio SDE component which it is not open-source. I'll contact Intel/Barefoot for this issue.