sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
738 stars 1.43k forks source link

psud dies immediately on Dell S5248F-ON #7644

Open aledwmorris opened 3 years ago

aledwmorris commented 3 years ago

Description

psud dies taking down pmon

Steps to reproduce the issue:

  1. install sonic latest download build
  2. boot
  3. notice all ports down, all led's hard on, no pmon

Describe the results you received:

can't configure ports, system is unuseable

Describe the results you expected:

some level of useability

Output of show version:

SONiC Software Version: SONiC.master.646-990b1127
Distribution: Debian 10.9
Kernel: 4.19.0-12-2-amd64
Build commit: 990b1127
Build date: Sat Apr 24 09:20:49 UTC 2021
Built by: johnar@jenkins-worker-2

Platform: x86_64-dellemc_s5248f_c3538-r0
HwSKU: DellEMC-S5248f-P-25G
ASIC: broadcom
ASIC Count: 1
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 18, in <module>
    import sonic_platform
ModuleNotFoundError: No module named 'sonic_platform'
Serial Number:
Uptime: 15:52:25 up  4:13,  1 user,  load average: 0.15, 0.19, 0.21

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-brcm             latest                ce7e46843b99        692MB
docker-syncd-brcm             master.646-990b1127   ce7e46843b99        692MB
docker-snmp                   latest                2a1358e2cb4e        441MB
docker-snmp                   master.646-990b1127   2a1358e2cb4e        441MB
docker-teamd                  latest                4cb68900abcd        411MB
docker-teamd                  master.646-990b1127   4cb68900abcd        411MB
docker-nat                    latest                621f8ddfdb3c        414MB
docker-nat                    master.646-990b1127   621f8ddfdb3c        414MB
docker-router-advertiser      latest                1789e293a171        400MB
docker-router-advertiser      master.646-990b1127   1789e293a171        400MB
docker-platform-monitor       latest                b90defd0b839        609MB
docker-platform-monitor       master.646-990b1127   b90defd0b839        609MB
docker-lldp                   latest                0769c7f9ab9f        440MB
docker-lldp                   master.646-990b1127   0769c7f9ab9f        440MB
docker-dhcp-relay             latest                7a31225ba0c4        407MB
docker-dhcp-relay             master.646-990b1127   7a31225ba0c4        407MB
docker-database               latest                59cd6564eee9        400MB
docker-database               master.646-990b1127   59cd6564eee9        400MB
docker-orchagent              latest                0c728469c002        429MB
docker-orchagent              master.646-990b1127   0c728469c002        429MB
docker-macsec                 latest                5dee32d3f3bd        414MB
docker-macsec                 master.646-990b1127   5dee32d3f3bd        414MB
docker-sonic-telemetry        latest                8f2654ec4760        490MB
docker-sonic-telemetry        master.646-990b1127   8f2654ec4760        490MB
docker-sonic-mgmt-framework   latest                e13c2b82ea84        619MB
docker-sonic-mgmt-framework   master.646-990b1127   e13c2b82ea84        619MB
docker-fpm-frr                latest                9020bcab68e5        429MB
docker-fpm-frr                master.646-990b1127   9020bcab68e5        429MB
docker-sflow                  latest                728e8f83520f        412MB
docker-sflow                  master.646-990b1127   728e8f83520f        412MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

output from sudo zgrep -ai "psud" /var/log/syslog*

(repeated over and over)

/var/log/syslog.2.gz:Apr 28 10:55:16.123766 sonic INFO pmon#supervisord 2021-04-28 10:55:16,122 INFO spawned: 'psud' with pid 33
/var/log/syslog.2.gz:Apr 28 10:55:16.124047 sonic INFO pmon#supervisord 2021-04-28 10:55:16,123 INFO success: psud entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
/var/log/syslog.2.gz:Apr 28 10:55:16.155619 sonic INFO pmon#/supervisord: psud Traceback (most recent call last):
/var/log/syslog.2.gz:Apr 28 10:55:16.155690 sonic INFO pmon#/supervisord: psud File "/usr/local/bin/psud", line 17, in
/var/log/syslog.2.gz:Apr 28 10:55:16.155724 sonic INFO pmon#/supervisord: psud from sonic_platform.psu import Psu
/var/log/syslog.2.gz:Apr 28 10:55:16.155757 sonic INFO pmon#/supervisord: psud ImportError: No module named sonic_platform.psu
/var/log/syslog.2.gz:Apr 28 10:55:16.159231 sonic INFO pmon#supervisord 2021-04-28 10:55:16,158 INFO exited: psud (exit status 1; not expected)
/var/log/syslog.2.gz:Apr 28 10:55:16.170507 sonic INFO pmon#/supervisor-proc-exit-listener: Process 'psud' exited unexpectedly. Terminating supervisor 'pmon'
jeff-yin commented 3 years ago

Please assign to @arunlk-dell

aledwmorris commented 3 years ago

Hi @arunlk-dell - if there is anything I can do to help troubleshoot; test new releases etc. please let me know, I'm very keen to get my switches working

aledwmorris commented 3 years ago

Hi @jeff-yin is there anything I can do to expedite this issue? I've been sitting on a stack of S5248F switches for a long time now with no working SONIC distribution

arunlk-dell commented 3 years ago

Hi @aledwmorris

The issue is not only with the psud, we could see many dockers are exiting. Trying to narrow down the actual root cause. If possible can you let us know what was the last stable image with which S5248F-ON booted up?

admin@sonic:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f67b72af7abb docker-sonic-telemetry:latest "/usr/local/bin/supe…" 2 weeks ago Up 2 weeks telemetry 1e8e2a74d79e docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 2 weeks ago Up 2 weeks mgmt-framework 1ef0977f2fb7 docker-lldp:latest "/usr/bin/docker-lld…" 2 weeks ago Up 2 weeks lldp 51f6679ea840 docker-fpm-frr:latest "/usr/bin/docker_ini…" 2 weeks ago Up 2 weeks bgp 9d613eeae7df docker-database:latest "/usr/local/bin/dock…" 2 weeks ago Up 2 weeks database admin@sonic:~$ admin@sonic:~$ admin@sonic:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4a2608516051 docker-snmp:latest "/usr/local/bin/supe…" 2 weeks ago Exited (137) 48 seconds ago snmp f67b72af7abb docker-sonic-telemetry:latest "/usr/local/bin/supe…" 2 weeks ago Up 2 weeks telemetry 1e8e2a74d79e docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 2 weeks ago Up 2 weeks mgmt-framework e1ed5e66c2c4 docker-router-advertiser:latest "/usr/bin/docker-ini…" 2 weeks ago Exited (0) About a minute ago radv 1ef0977f2fb7 docker-lldp:latest "/usr/bin/docker-lld…" 2 weeks ago Up 2 weeks lldp ad4e74b06f78 docker-dhcp-relay:latest "/usr/bin/docker_ini…" 2 weeks ago Exited (0) About a minute ago dhcp_relay 9578f2e73e35 docker-syncd-brcm:latest "/usr/local/bin/supe…" 2 weeks ago Exited (0) About a minute ago syncd 7bcce7c9b0bc docker-teamd:latest "/usr/local/bin/supe…" 2 weeks ago Exited (0) About a minute ago teamd e475c5cb1dbe docker-platform-monitor:latest "/usr/bin/docker_ini…" 2 weeks ago Exited (0) 2 weeks ago pmon 5074ff5a231a docker-orchagent:latest "/usr/bin/docker-ini…" 2 weeks ago Exited (0) About a minute ago swss 51f6679ea840 docker-fpm-frr:latest "/usr/bin/docker_ini…" 2 weeks ago Up 2 weeks bgp 9d613eeae7df docker-database:latest "/usr/local/bin/dock…" 2 weeks ago Up 2 weeks database

aledwmorris commented 3 years ago

I've never had a working release though I only started trying SONIC at the beginning of this year.

aledwmorris commented 3 years ago

Hi @arunlk-dell have you made any progress on this yet?

VR-Architect commented 3 years ago

Same issue on new Dell S6010-ON with new install of latest Sonic. Also there is no config file. Additional crash code on reload below.

File "/usr/local/bin/decode-syseeprom", line 18, in import sonic_platform ModuleNotFoundError: No module named 'sonic_platform

root@sonic:~# show platform summary Warning: failed to retrieve PORT table from ConfigDB! Warning: failed to retrieve PORT table from ConfigDB! Platform: x86_64-dell_s6010_c2538-r0 HwSKU: None ASIC: broadcom ASIC Count: 1

admin@sonic:~$ sudo config reload -y Warning: failed to retrieve PORT table from ConfigDB! Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Traceback (most recent call last): File "/usr/local/bin/sonic-cfggen", line 431, in main() File "/usr/local/bin/sonic-cfggen", line 326, in main _process_json(args, data) File "/usr/local/bin/sonic-cfggen", line 237, in _process_json deep_update(data, FormatConverter.to_deserialized(json.load(stream))) File "/usr/lib/python3.7/json/init.py", line 296, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

arunlk-dell commented 3 years ago

Hi @VR-Architect, Dell S6010-ON is currently not on SONiC's list of supported devices.

VR-Architect commented 3 years ago

Hi @VR-Architect, Dell S6010-ON is currently not on SONiC's list of supported devices.

The Dell S6000-ON is in the list and has a download for it. That is the same class as the S6010-ON. Shouldn't it work?

arunlk-dell commented 3 years ago

Dell S6000-ON and Dell S6100-ON are supported devices, S6010 is different and not supported.

aledwmorris commented 3 years ago

Hi @arunlk-dell, are you making any progress on diagnosing the problem with the containers? I can help with testing.

arunlk-dell commented 3 years ago

Hi aledwmorris, with the latest changes able to bring up all the dockers up. The ports are up and able to show the SFP details. But show commands related to eprom,psu are failing, working on it.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6ef576eea2a6 docker-snmp:latest "/usr/local/bin/supe…" 5 hours ago Up 9 minutes snmp 583f6c7202b2 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 5 hours ago Up 9 minutes telemetry e0c77f14144b docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 5 hours ago Up 9 minutes mgmt-framework c221f81f083a docker-router-advertiser:latest "/usr/bin/docker-ini…" 5 hours ago Up 11 minutes radv 45d9acb8cdf1 docker-lldp:latest "/usr/bin/docker-lld…" 5 hours ago Up 13 minutes lldp bcc8c1c56e23 docker-dhcp-relay:latest "/usr/bin/docker_ini…" 5 hours ago Up 11 minutes dhcp_relay 705f79ca127f docker-syncd-brcm:latest "/usr/local/bin/supe…" 5 hours ago Up 11 minutes syncd 1b653ec26588 docker-teamd:latest "/usr/local/bin/supe…" 5 hours ago Up 11 minutes teamd 1ad4208d76db docker-platform-monitor:latest "/usr/bin/docker_ini…" 5 hours ago Up 13 minutes pmon e0255a4c0caa docker-orchagent:latest "/usr/bin/docker-ini…" 5 hours ago Up 11 minutes swss 1e2dcbcf3605 docker-fpm-frr:latest "/usr/bin/docker_ini…" 5 hours ago Up 13 minutes bgp ac3508c97e98 docker-database:latest "/usr/local/bin/dock…" 5 hours ago Up 13 minutes database

aledwmorris commented 3 years ago

That's great news, @arunlk-dell - please keep me informed of progress. Thank you for your efforts!

arunlk-dell commented 3 years ago

@aledwmorris , changes are merged thru https://github.com/Azure/sonic-buildimage/pull/7930. Can you verify the behavior with the latest build.

aledwmorris commented 3 years ago

Great, can you re-enable the Jenkins pipeline and I'll take a snapshot from there

https://sonic-jenkins.westus2.cloudapp.azure.com/job/broadcom/job/buildimage-brcm-all/

unless you can point me to a snapshot I can download directly?

arunlk-dell commented 3 years ago

https://sonic-build.azurewebsites.net/ui/sonic/pipelines/138/builds?branchName=master