sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

SONiC unusable on EdgeCore AS7726 #9716

Open YanChii opened 2 years ago

YanChii commented 2 years ago

Description

After fresh install using ONIE, the system is practically unusable. A lot of commands gives errors.

root@sonic:~# show platform syseeprom
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 18, in <module>
    import sonic_platform
ModuleNotFoundError: No module named 'sonic_platform'

This is same as https://github.com/Azure/sonic-buildimage/issues/8506:

admin@sonic:~$ sonic-cli
sonic# show interface status
/tmp/klish.fifo.55.wHBkfr: 2: /tmp/klish.fifo.55.wHBkfr: python: not found
root@sonic:/home/admin# show platform psustatus
Error: Failed to get the number of PSUs
Error: Failed to get PSU status
Error: failed to get PSU status from state DB

The problem is even worse on the latest master because the master version doesn't recognize interfaces.

root@sonic:/home/admin# show interfaces status
  Interface            Lanes    Speed    MTU    FEC          Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  ---------------  -------  -----  -----  -------------  ------  ------  -------  ------  ----------
root@sonic:/home/admin# 

Steps to reproduce the issue:

  1. Clean install from ONIE

Describe the results you received:

Broken basic functionality.

Describe the results you expected:

System should recognize the platform, fans, and interfaces, sonic-cli should work.

Output of show version:

root@sonic:~# show version

SONiC Software Version: SONiC.202106.64065-c4bc9933f
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: c4bc9933f
Build date: Sun Jan  9 15:05:18 UTC 2022
Built by: AzDevOps@sonic-build-workers-0011XU

Platform: x86_64-accton_as7726_32x-r0
HwSKU: Accton-AS7726-32X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 16:58:50 up 5 min,  1 user,  load average: 2.37, 1.68, 0.77

Docker images:
REPOSITORY                    TAG                      IMAGE ID            SIZE
docker-dhcp-relay             latest                   c6051263cf75        428MB
docker-fpm-frr                202106.64065-c4bc9933f   ee1feb35907f        451MB
docker-fpm-frr                latest                   ee1feb35907f        451MB
docker-platform-monitor       202106.64065-c4bc9933f   af9884192680        636MB
docker-platform-monitor       latest                   af9884192680        636MB
docker-syncd-brcm             202106.64065-c4bc9933f   c60bb709f3c4        717MB
docker-syncd-brcm             latest                   c60bb709f3c4        717MB
docker-sflow                  202106.64065-c4bc9933f   14843f12feaa        433MB
docker-sflow                  latest                   14843f12feaa        433MB
docker-teamd                  202106.64065-c4bc9933f   b9f83cbf7652        433MB
docker-teamd                  latest                   b9f83cbf7652        433MB
docker-nat                    202106.64065-c4bc9933f   38a5e0a9b8b9        435MB
docker-nat                    latest                   38a5e0a9b8b9        435MB
docker-router-advertiser      202106.64065-c4bc9933f   59fbdb83c26f        421MB
docker-router-advertiser      latest                   59fbdb83c26f        421MB
docker-macsec                 202106.64065-c4bc9933f   de82a0fd93b5        436MB
docker-macsec                 latest                   de82a0fd93b5        436MB
docker-lldp                   202106.64065-c4bc9933f   233a69e2ef71        461MB
docker-lldp                   latest                   233a69e2ef71        461MB
docker-orchagent              202106.64065-c4bc9933f   189752424b5b        451MB
docker-orchagent              latest                   189752424b5b        451MB
docker-database               202106.64065-c4bc9933f   1c724b0592cc        421MB
docker-database               latest                   1c724b0592cc        421MB
docker-snmp                   202106.64065-c4bc9933f   079fc96af4bf        463MB
docker-snmp                   latest                   079fc96af4bf        463MB
docker-sonic-mgmt-framework   202106.64065-c4bc9933f   c3666e284284        576MB
docker-sonic-mgmt-framework   latest                   c3666e284284        576MB
docker-sonic-telemetry        202106.64065-c4bc9933f   89c4ae240d09        510MB
docker-sonic-telemetry        latest                   89c4ae240d09        510MB

Output of show techsupport:

sonic_dump_sonic_release-2021-06_20210724_170018.tar.gz

Additional information you deem important (e.g. issue happens only occasionally):

The SONiC.202012.64040-a0376a6e5 downloaded from here works much better (show platform ... commands are fine) but sonic-cli gives the same python error as above.

Am I doing something wrong?

Thanks for the help.

Jan

zhangyanzhao commented 2 years ago

@YanChii can you please create one issue for one exception? @wally-wang-accton

wally-wang commented 2 years ago

We don't test 202106, maybe you can try to remove sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/debian/sonic-platform-accton-as7726-32x.postinst sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/debian/sonic-platform-accton-as7726-32x.install sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/service/pddf-platform-init.service Then, add this open PR, https://github.com/Azure/sonic-buildimage/pull/8305.

YanChii commented 2 years ago

Thanks for the reply. @wally-wang-accton which version should I test then? I'll create issues for the version that makes sense for you. Jan

wally-wang commented 2 years ago

You can try the 202012 branch.

ITJamie commented 2 years ago

I just installed latest build (72021) for broadcom. Interfaces now show up 🎉 . i believe https://github.com/Azure/sonic-buildimage/commit/ddfe87a71a84b7ce47324f178b4aeb5192887cf1 may have been the fix for missing interfaces

But still no system details / fan control. attempting to start the pddf-platform-init service still shows missing files stopping service from starting

admin@spine-sw02:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC            Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  ---------------  -------  -----  -----  ---------------  ------  ------  -------  ------  ----------
  Ethernet0          1,2,3,4     100G   9100    N/A    Eth1/1(Port1)  routed    down       up     N/A         N/A
  Ethernet4          5,6,7,8     100G   9100    N/A    Eth2/1(Port2)  routed    down       up     N/A         N/A
  Ethernet8       9,10,11,12     100G   9100    N/A    Eth3/1(Port3)  routed    down       up     N/A         N/A
 Ethernet12      13,14,15,16     100G   9100    N/A    Eth4/1(Port4)  routed    down       up     N/A         N/A
 Ethernet16      17,18,19,20     100G   9100    N/A    Eth5/1(Port5)  routed    down       up     N/A         N/A
 Ethernet20      21,22,23,24     100G   9100    N/A    Eth6/1(Port6)  routed    down       up     N/A         N/A
 Ethernet24      25,26,27,28     100G   9100    N/A    Eth7/1(Port7)  routed    down       up     N/A         N/A
 Ethernet28      29,30,31,32     100G   9100    N/A    Eth8/1(Port8)  routed    down       up     N/A         N/A
 Ethernet32      33,34,35,36     100G   9100    N/A    Eth9/1(Port9)  routed    down       up     N/A         N/A
 Ethernet36      37,38,39,40     100G   9100    N/A  Eth10/1(Port10)  routed    down       up     N/A         N/A
 Ethernet40      41,42,43,44     100G   9100    N/A  Eth11/1(Port11)  routed    down       up     N/A         N/A
 Ethernet44      45,46,47,48     100G   9100    N/A  Eth12/1(Port12)  routed    down       up     N/A         N/A
 Ethernet48      49,50,51,52     100G   9100    N/A  Eth13/1(Port13)  routed    down       up     N/A         N/A
 Ethernet52      53,54,55,56     100G   9100    N/A  Eth14/1(Port14)  routed    down       up     N/A         N/A
 Ethernet56      57,58,59,60     100G   9100    N/A  Eth15/1(Port15)  routed    down       up     N/A         N/A
 Ethernet60      61,62,63,64     100G   9100    N/A  Eth16/1(Port16)  routed    down       up     N/A         N/A
 Ethernet64      65,66,67,68     100G   9100    N/A  Eth17/1(Port17)  routed    down       up     N/A         N/A
 Ethernet68      69,70,71,72     100G   9100    N/A  Eth18/1(Port18)  routed    down       up     N/A         N/A
 Ethernet72      73,74,75,76     100G   9100    N/A  Eth19/1(Port19)  routed    down       up     N/A         N/A
 Ethernet76      77,78,79,80     100G   9100    N/A  Eth20/1(Port20)  routed    down       up     N/A         N/A
 Ethernet80      81,82,83,84     100G   9100    N/A  Eth21/1(Port21)  routed    down       up     N/A         N/A
 Ethernet84      85,86,87,88     100G   9100    N/A  Eth22/1(Port22)  routed    down       up     N/A         N/A
 Ethernet88      89,90,91,92     100G   9100    N/A  Eth23/1(Port23)  routed    down       up     N/A         N/A
 Ethernet92      93,94,95,96     100G   9100    N/A  Eth24/1(Port24)  routed    down       up     N/A         N/A
 Ethernet96     97,98,99,100     100G   9100    N/A  Eth25/1(Port25)  routed    down       up     N/A         N/A
Ethernet100  101,102,103,104     100G   9100    N/A  Eth26/1(Port26)  routed    down       up     N/A         N/A
Ethernet104  105,106,107,108     100G   9100    N/A  Eth27/1(Port27)  routed    down       up     N/A         N/A
Ethernet108  109,110,111,112     100G   9100    N/A  Eth28/1(Port28)  routed    down       up     N/A         N/A
Ethernet112  113,114,115,116     100G   9100    N/A  Eth29/1(Port29)  routed    down       up     N/A         N/A
Ethernet116  117,118,119,120     100G   9100    N/A  Eth30/1(Port30)  routed    down       up     N/A         N/A
Ethernet120  121,122,123,124     100G   9100    N/A  Eth31/1(Port31)  routed    down       up     N/A         N/A
Ethernet124  125,126,127,128     100G   9100    N/A  Eth32/1(Port32)  routed    down       up     N/A         N/A
admin@spine-sw02:~$ uname -a
Linux spine-sw02.bk3.39122.as 5.10.0-8-2-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64 GNU/Linux
admin@spine-sw02:~$ show platform fan
Fan Not detected
admin@spine-sw02:~$ show platform psustatus
Error: Failed to get the number of PSUs
Error: Failed to get PSU status
Error: failed to get PSU status from state DB
admin@spine-sw02:~$ show platform summary
Platform: x86_64-accton_as7726_32x-r0
HwSKU: Accton-AS7726-32X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A

admin@spine-sw02:~$ show platform syseeprom
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 18, in <module>
    import sonic_platform
ModuleNotFoundError: No module named 'sonic_platform'

attempting to start pddf service

root@spine-sw02:/usr/local/bin# service pddf-platform-init status
● pddf-platform-init.service - PDDF module and device initialization service
     Loaded: loaded (/lib/systemd/system/pddf-platform-init.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2022-02-13 01:07:57 UTC; 2s ago
    Process: 6523 ExecStartPre=/usr/local/bin/pre_pddf_init.sh (code=exited, status=0/SUCCESS)
    Process: 6524 ExecStart=/usr/local/bin/pddf_util.py install (code=exited, status=203/EXEC)
   Main PID: 6524 (code=exited, status=203/EXEC)

Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[1]: Starting PDDF module and device initialization service...
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[6523]: pddf-platform-init.service: Executable /usr/local/bin/pre_pddf_init.sh missing, skipping: No such file or directory
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[6524]: pddf-platform-init.service: Failed to locate executable /usr/local/bin/pddf_util.py: No such file or directory
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[6524]: pddf-platform-init.service: Failed at step EXEC spawning /usr/local/bin/pddf_util.py: No such file or directory
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[1]: pddf-platform-init.service: Main process exited, code=exited, status=203/EXEC
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[1]: pddf-platform-init.service: Failed with result 'exit-code'.
Feb 13 01:07:57 spine-sw02.bk3.39122.as systemd[1]: Failed to start PDDF module and device initialization service.
ITJamie commented 2 years ago

@jostar-yang / @wally-wang-accton do you have any idea when the proposed fixes will be merged? the most recent commit to PR #8305 was on march 8th. but this has been going on for quite some time

jostar-yang commented 2 years ago

There is some problem on 202106br if you use this branch. PDDF is not complete merged. So it cause PDDF init fail. PDDF work on master code base. But it seems it is not good to get master code to do manual merge to 202106br. Because kernel is different and master has use sfp refactor, PDDF support it in master code. But these code can't work on 202106br. So you need to get code from 202012br to let api2 work. And merge from https://github.com/Azure/sonic-buildimage/pull/8305. I will list more detail procedure about this.

jostar-yang commented 2 years ago

When use 202106BR, you should do below to your code base, rm sonic-buildimage/device/accton/x86_64-accton_as7726_32x-r0/pddf_support rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/debian/sonic-platform-accton-as7726-32x.postinst rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/debian/sonic-platform-accton-as7726-32x.install

rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/service/pddf-platform-init.service rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/service/as7726-32x-pddf-platform-monitor.service rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/sonic_platform-1.0-py3-none-any.whl rm -rf sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/sonic_platform.egg-info rm -rf sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/build rm -rf sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/sonic_platform_setup.py

Get from 202012br for below files, rm sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/debian/sonic-platform-accton-as7726-32x.install sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7726-32x/sonic_platform_setup.py sonic-buildimage/device/accton/x86_64-accton_as7726_32x-r0/sonic_platform Get code from https://github.com/Azure/sonic-buildimage/pull/8305 and merge to your code base.

After done these procedure, you need to make clean and re-make.

jostar-yang commented 2 years ago

I suggest you can use the latest master code base because PDDF work well. And show platform cmd work well. You don't need do these manual merge from 202012br and PR#8305.

YanChii commented 2 years ago

Thanks @jostar-yang

I confirm the platform info and interface status is working in master 20220404.3 build.

admin@sonic100sk:~$ show platform syseeprom
TlvInfo Header:
   Id String:    TlvInfo
   Version:      1
   Total Length: 172
TLV Name             Code      Len  Value
-------------------  ------  -----  ---------------------------
Product Name         0x21       15  7726-32X-O-AC-F
Part Number          0x22       13  FP3ZZ7632074A
Serial Number        0x23       14  772632X2134032
Base MAC Address     0x24        6  F8:8E:A1:F5:3A:A0
Manufacture Date     0x25       19  08/23/2021 12:17:50
Label Revision       0x27        4  R01E
Platform Name        0x28       27  x86_64-accton_as7726_32x-r0
ONIE Version         0x29       13  2017.11.00.05
MAC Addresses        0x2A        2  256
Manufacturer         0x2B        6  Accton
Manufacture Country  0x2C        2  TW
Vendor Name          0x2D        8  Edgecore
Diag Version         0x2E       11  01.01.01.02
CRC-32               0xFE        4  0x8A4B264C

(checksum valid)
admin@sonic100sk:~$ show version

SONiC Software Version: SONiC.master.87144-13aa2332e
Distribution: Debian 11.3
Kernel: 5.10.0-8-2-amd64
Build commit: 13aa2332e
Build date: Mon Apr  4 19:53:32 UTC 2022
Built by: AzDevOps@sonic-build-workers-001CAY

Platform: x86_64-accton_as7726_32x-r0
HwSKU: Accton-AS7726-32X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: FP3ZZ7632074A
Hardware Revision: N/A
Uptime: 13:15:01 up 11 min,  1 user,  load average: 1.21, 1.00, 0.65
Date: Tue 05 Apr 2022 13:15:01

Docker images:
REPOSITORY                    TAG                      IMAGE ID       SIZE
docker-syncd-brcm             latest                   7e446a9d8ee9   784MB
docker-syncd-brcm             master.87144-13aa2332e   7e446a9d8ee9   784MB
docker-gbsyncd-credo          latest                   e72ce0d3f62a   422MB
docker-gbsyncd-credo          master.87144-13aa2332e   e72ce0d3f62a   422MB
docker-sonic-telemetry        latest                   f2cea2fde8f9   506MB
docker-sonic-telemetry        master.87144-13aa2332e   f2cea2fde8f9   506MB
docker-dhcp-relay             latest                   22708e497004   427MB
docker-database               latest                   3e80d35ee2e7   417MB
docker-database               master.87144-13aa2332e   3e80d35ee2e7   417MB
docker-router-advertiser      latest                   9be3e21647c7   417MB
docker-router-advertiser      master.87144-13aa2332e   9be3e21647c7   417MB
docker-orchagent              latest                   5c9aa99d72d6   437MB
docker-orchagent              master.87144-13aa2332e   5c9aa99d72d6   437MB
docker-sflow                  latest                   b3cf608b6505   421MB
docker-sflow                  master.87144-13aa2332e   b3cf608b6505   421MB
docker-fpm-frr                latest                   dfaf657df27f   438MB
docker-fpm-frr                master.87144-13aa2332e   dfaf657df27f   438MB
docker-nat                    latest                   c5f25085e783   423MB
docker-nat                    master.87144-13aa2332e   c5f25085e783   423MB
docker-macsec                 latest                   c4e3bb861352   423MB
docker-macsec                 master.87144-13aa2332e   c4e3bb861352   423MB
docker-teamd                  latest                   e5f6c3546dae   420MB
docker-teamd                  master.87144-13aa2332e   e5f6c3546dae   420MB
docker-platform-monitor       latest                   507483e4fdbc   519MB
docker-platform-monitor       master.87144-13aa2332e   507483e4fdbc   519MB
docker-sonic-mgmt-framework   latest                   862c7d4ab1f3   549MB
docker-sonic-mgmt-framework   master.87144-13aa2332e   862c7d4ab1f3   549MB
docker-snmp                   latest                   4d247fff9b4d   449MB
docker-snmp                   master.87144-13aa2332e   4d247fff9b4d   449MB
docker-lldp                   latest                   102ddc67973f   445MB
docker-lldp                   master.87144-13aa2332e   102ddc67973f   445MB
docker-mux                    latest                   5166f4c9a8df   458MB
docker-mux                    master.87144-13aa2332e   5166f4c9a8df   458MB
admin@sonic100sk:~$ show platform psustatus
PSU    Model       Serial            HW Rev      Voltage (V)    Current (A)    Power (W)  Status    LED
-----  ----------  ----------------  --------  -------------  -------------  -----------  --------  -----
PSU 1  FSF019-611  FSF0192114004066  N/A               12.12          12.00       150.00  OK        green
PSU 2  FSF019-611  FSF0192114004044  N/A                0.00           0.00         0.00  NOT OK    off
admin@sonic100sk:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC          Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -------------  ------  ------  -------  ---------------  ----------
  Ethernet0          1,2,3,4     100G   9100    N/A   hundredGigE1  routed    down       up              N/A         N/A
  Ethernet4          5,6,7,8     100G   9100    N/A   hundredGigE2  routed    down       up  QSFP28 or later         N/A
  Ethernet8       9,10,11,12     100G   9100    N/A   hundredGigE3  routed    down       up              N/A         N/A
 Ethernet12      13,14,15,16     100G   9100    N/A   hundredGigE4  routed    down       up              N/A         N/A
 Ethernet16      17,18,19,20     100G   9100    N/A   hundredGigE5  routed    down       up              N/A         N/A
...

However:

admin@sonic100sk:~$ sonic-cli
sonic100sk# show interface status
/tmp/klish.fifo.63.dqx1TH: 2: /tmp/klish.fifo.63.dqx1TH: python: not found

Should I start a separate thread about the above error?

Thanks for the work guys.

Jan

ITJamie commented 2 years ago

@jostar-yang Ive noticed a few issues with show platform on master 20220404.3

under show version. no serial number or hardware rev

show platform summary
Platform: x86_64-accton_as7326_56x-r0
HwSKU: Accton-AS7326-56X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: FP4ZZ7656005A
Hardware Revision: N/A

no firmware status listed

show platform firmware status
Chassis    Module    Component    Version    Description
---------  --------  -----------  ---------  -------------

Otherwise looks good in master

jostar-yang commented 2 years ago

For "show platform firmware status", it need to manual merge below PR, https://github.com/Azure/sonic-buildimage/pull/8315

For "show platform summary", we are checking pddf code currently.

feanis commented 2 years ago

Hi Guys,

Did you also verified traffic on it's interfaces ?

Thanks & Regards, Feanis