sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
737 stars 1.43k forks source link

[system-health] User-defined health checkers fail to start #12701

Open antonptashnik opened 1 year ago

antonptashnik commented 1 year ago

Description

Per HLD , a user can define own checkers to be executed on health checks in a form provided below, in file /usr/share/sonic/device//system_health_monitoring_config.json

{
...
  "user_defined_checkers": ["program_name -option1 value1 -option2 value2"],
...
}

Attempt to add any script demonstrates that it does not start

Steps to reproduce the issue:

  1. create a sample file at "~/checker_output.txt" with content below that adheres to output format a user-defined checker should produce
    ExternalCategory
    ExternalService:Service is not working
    ExternalDevice:Device is broken
  2. add a new checker that just outputs the created text file, by appending the cat ~/checker_output.txt into /usr/share/sonic/device/<platform>/system_health_monitoring_config.json. On instance:
    {
    ...
    "user_defined_checkers": ["cat ~/checker_output.txt"],
    ...
    }
  3. wait for a minute (default health check interval) and check results
sudo show system-health detail

Describe the results you received:

No parsed check results, an error is logged

...
Reasons: Failed to get output of command "cat ~/checker_output.txt"
...

Describe the results you expected:

Parsed result are shown as

ExternalService      Not OK    UserDefine
ExternalDevice       Not OK    UserDefine

Output of show version:

SONiC Software Version: SONiC.master.172539-dirty-20221110.095210
Distribution: Debian 11.5
Kernel: 5.10.0-12-2-amd64
Build commit: 7c746e67d
Build date: Thu Nov 10 16:18:12 UTC 2022
Built by: AzDevOps@sonic-build-workers-002D7H

Platform: x86_64-arista_7170_64c
HwSKU: Arista-7170-64C

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Investigation showed the issue is at https://github.com/sonic-net/sonic-buildimage/blob/master/src/system-health/health_checker/utils.py#L11 . subprocess.Popen requires a list with a command name and args followed, but just a single string is provided instead.

azure-pipelines-wrapper[bot] commented 1 year ago

Thanks for opening this issue!