sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.43k forks source link

Celestica Seastone DX010-C32: show system-health detail fails with 'Chassis' object has no attribute 'initizalize_system_led' #11322

Open FunForNOS opened 2 years ago

FunForNOS commented 2 years ago

Description

the command sudo show system-health detail fails with 'Chassis' object has no attribute 'initizalize_system_led'.

Steps to reproduce the issue:

  1. Install Sonic 202111 or 202205
  2. log in as admin
  3. issue command sudo show system-health detail

Describe the results you received:

Output:

admin@sonic:~$ sudo show system-health detail 
Traceback (most recent call last):
  File "/usr/local/bin/show", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 110, in detail
    chassis.initizalize_system_led()
AttributeError: 'Chassis' object has no attribute 'initizalize_system_led'

Describe the results you expected:

The command completes without error and shows details of the systems health.

Output of show version:

admin@sonic:~$ show version

SONiC Software Version: SONiC.202111.114441-e8daeacd3
Distribution: Debian 11.3
Kernel: 5.10.0-8-2-amd64
Build commit: e8daeacd3
Build date: Sat Jun 25 20:00:32 UTC 2022
Built by: AzDevOps@sonic-build-workers-001OKA

Platform: x86_64-cel_seastone-r0
HwSKU: Seastone-DX010
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 08:39:01 up 30 min,  4 users,  load average: 0.83, 0.75, 0.63

Docker images:
REPOSITORY                    TAG                       IMAGE ID       SIZE
docker-syncd-brcm             202111.114441-e8daeacd3   4e27432ac89d   798MB
docker-syncd-brcm             latest                    4e27432ac89d   798MB
docker-gbsyncd-credo          202111.114441-e8daeacd3   c5b0c92c4048   474MB
docker-gbsyncd-credo          latest                    c5b0c92c4048   474MB
docker-dhcp-relay             latest                    cce38ba00022   433MB
docker-orchagent              202111.114441-e8daeacd3   796c0264eafe   452MB
docker-orchagent              latest                    796c0264eafe   452MB
docker-nat                    202111.114441-e8daeacd3   34369d8e429f   437MB
docker-nat                    latest                    34369d8e429f   437MB
docker-fpm-frr                202111.114441-e8daeacd3   eb901261db6f   452MB
docker-fpm-frr                latest                    eb901261db6f   452MB
docker-macsec                 202111.114441-e8daeacd3   d281120d993c   437MB
docker-macsec                 latest                    d281120d993c   437MB
docker-teamd                  202111.114441-e8daeacd3   9b811857fff6   434MB
docker-teamd                  latest                    9b811857fff6   434MB
docker-sonic-telemetry        202111.114441-e8daeacd3   e52bf76682eb   508MB
docker-sonic-telemetry        latest                    e52bf76682eb   508MB
docker-snmp                   202111.114441-e8daeacd3   39ca86f575d8   463MB
docker-snmp                   latest                    39ca86f575d8   463MB
docker-platform-monitor       202111.114441-e8daeacd3   523fd8aacfdb   684MB
docker-platform-monitor       latest                    523fd8aacfdb   684MB
docker-sonic-mgmt-framework   202111.114441-e8daeacd3   896fb274c5a8   574MB
docker-sonic-mgmt-framework   latest                    896fb274c5a8   574MB
docker-sflow                  202111.114441-e8daeacd3   f31496b8f1ea   435MB
docker-sflow                  latest                    f31496b8f1ea   435MB
docker-router-advertiser      202111.114441-e8daeacd3   179f73576642   420MB
docker-router-advertiser      latest                    179f73576642   420MB
docker-lldp                   202111.114441-e8daeacd3   2ce7f21a190c   460MB
docker-lldp                   latest                    2ce7f21a190c   460MB
docker-mux                    202111.114441-e8daeacd3   e4a843de630d   472MB
docker-mux                    latest                    e4a843de630d   472MB
docker-database               202111.114441-e8daeacd3   bbfb5876af97   420MB
docker-database               latest                    bbfb5876af97   420MB

Output of show techsupport:

[sonic_dump_sonic_20220702_083918.tar.gz](https://github.com/Azure/sonic-buildimage/files/9032791/sonic_dump_sonic_20220702_083918.tar.gz)
qnos commented 1 year ago

It should be system-health CLI module issue. There are no initizalize_system_led or initialize_system_led defined in sonic_platform_base, and obviously the function name invoked in system-health module is a typo. Even if it need to initialize_system_led, it should put the logic into platform code, instead of this high level system-health CLI module.

Add initizalize_system_led into sonic_platform chassis module is just a workaround, not a good solution for this issue. My suggestion is to remove chassis.initizalize_system_led() call in system_health module.

qnos commented 1 year ago

@qiluo-msft Glad to hear your thoughts about this issue, if there are some reasons not adequately considered to keep the chassis.initizalize_system_led() (also the name is a typo) in sonic_utilities/show/system_health.py.

qnos commented 1 year ago

@shlomibitton Can you share your opinion about the necessity to keep chassis.initizalize_system_led() in system_health CLI module?

FunForNOS commented 1 year ago

@qiluo-msft thanks for all the efforts you put into fixing the issues.

Using the sonic build from the PR branch (https://dev.azure.com/mssonic/build/_build/results?buildId=203616&view=artifacts&pathAsName=false&type=publishedArtifacts) on one of our Seastones, we now get a different error:

admin@sonic:~$ sudo show system-health detail
Traceback (most recent call last):
  File "/usr/local/bin/show", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 120, in detail
    manager, chassis, stat = get_system_health_status()
  File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 10, in get_system_health_status
    if os.environ["UTILITIES_UNIT_TESTING"] == "1":
  File "/usr/lib/python3.9/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'UTILITIES_UNIT_TESTING'

Should the build be testable on real hardware or is this 'UTILITIES_UNIT_TESTING' output due to using a PR build?

Kind regards, FunForNOS

PS: or should I have commented the PR instead?

FunForNOS commented 1 year ago

Can confirm this bug is fixed. Tested with build id 207726.

Issue #9530 seems to be a duplicate.

spilkey-cisco commented 1 year ago

Context seems to be missing, is there a reason to keep chassis.initizalize_system_led at all? Most platforms seem to make this a no-op method, the method name is still a typo, and is still not included in chassis_base.py.

If chassis.initizalize_system_led is actually needed, this API needs documentation to explain exactly what purpose it serves; if kept, it also should likely come before the manager.check call here, not after: https://github.com/sonic-net/sonic-utilities/blob/master/show/system_health.py#L30

healthd does not call chassis.initizalize_system_led at all (https://github.com/sonic-net/sonic-buildimage/blob/master/src/system-health/scripts/healthd#L76), implying any system led initialization should be performed as part of the chassis object initialization.

I would request chassis.initizalize_system_led be removed from system_health.py as this causes conflicts between healthd and show system-health.

I am also surprised to see that show system-health (system_health.py) does not consume the database details populated by healthd and instead just reruns the health checker locally for each invocation of show system-health. Is this intentional? Other CLIs (such as fan, temperature, psu, etc.) do not work this way, and consume the database details populated by the corresponding services (thermalctld, psud, etc.).