sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
745 stars 1.44k forks source link

"ERR healthd: system_service" is seen during reboot #16596

Open dgsudharsan opened 1 year ago

dgsudharsan commented 1 year ago

Description

Sometimes rarely during reboot the following error message is seen Sep 16 04:41:15.299387 r-anaconda-51 ERR healthd: system_service Sep 16 04:41:15.502782 r-anaconda-51 NOTICE healthd: Caught SIGTERM - exiting...

When the system goes down, sysmonitor is handling various events the error is seen and we also wee healthd exiting on sigterm. Though its non functional it is better to avoid error messages in syslog during reboot.

Steps to reproduce the issue:

  1. Perform reboot

Describe the results you received:

Error in syslog

Describe the results you expected:

No errors in syslog

Output of show version:


SONiC Software Version: SONiC.202205_3_rc.5-0cd56cda1_Internal
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: 0cd56cda1
Build date: Mon Sep  4 17:23:26 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1949X06182
Model Number: MSN3700-VS2F
Hardware Revision: A3
Uptime: 04:47:49 up 6 min,  1 user,  load average: 0.53, 0.49, 0.28
Date: Sat 16 Sep 2023 04:47:49

Docker images:
REPOSITORY                                         TAG                                IMAGE ID       SIZE
docker-macsec                                      latest                             b949af156ace   332MB
docker-dhcp-relay                                  latest                             e350fd90a2d9   321MB
docker-syncd-mlnx                                  202205_3_rc.5-0cd56cda1_Internal   793bf19c90f2   905MB
docker-syncd-mlnx                                  latest                             793bf19c90f2   905MB
docker-sonic-telemetry                             202205_3_rc.5-0cd56cda1_Internal   6ae9dc6fa870   394MB
docker-sonic-telemetry                             latest                             6ae9dc6fa870   394MB
docker-teamd                                       202205_3_rc.5-0cd56cda1_Internal   4016c0b7065b   330MB
docker-teamd                                       latest                             4016c0b7065b   330MB
docker-snmp                                        202205_3_rc.5-0cd56cda1_Internal   4cfea9652133   364MB
docker-snmp                                        latest                             4cfea9652133   364MB
docker-router-advertiser                           202205_3_rc.5-0cd56cda1_Internal   2591f4f97335   314MB
docker-router-advertiser                           latest                             2591f4f97335   314MB
docker-platform-monitor                            202205_3_rc.5-0cd56cda1_Internal   1b307a3cf3c8   750MB
docker-platform-monitor                            latest                             1b307a3cf3c8   750MB
docker-orchagent                                   202205_3_rc.5-0cd56cda1_Internal   bacc58208db7   347MB
docker-orchagent                                   latest                             bacc58208db7   347MB
docker-mux                                         202205_3_rc.5-0cd56cda1_Internal   ad53db10c906   362MB
docker-mux                                         latest                             ad53db10c906   362MB
docker-lldp                                        202205_3_rc.5-0cd56cda1_Internal   d493a34ab3a1   356MB
docker-lldp                                        latest                             d493a34ab3a1   356MB
docker-fpm-frr                                     202205_3_rc.5-0cd56cda1_Internal   d3c8380dfb68   359MB
docker-fpm-frr                                     latest                             d3c8380dfb68   359MB
docker-database                                    202205_3_rc.5-0cd56cda1_Internal   9afae57d44af   314MB
docker-database                                    latest                             9afae57d44af   314MB
docker-sonic-mgmt-framework                        202205_3_rc.5-0cd56cda1_Internal   ce6f2b2f3843   433MB
docker-sonic-mgmt-framework                        latest                             ce6f2b2f3843   433MB
docker-sflow                                       202205_3_rc.5-0cd56cda1_Internal   b186e61e3057   303MB
docker-sflow                                       latest                             b186e61e3057   303MB
docker-nat                                         202205_3_rc.5-0cd56cda1_Internal   74672dd53a4b   305MB
docker-nat                                         latest                             74672dd53a4b   305MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.3.5-202205-13                    83e467d24196   310MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

sysdump_test_core_functionality_with_reboot (1).tar.gz

dgsudharsan commented 1 year ago

@sg893052 @adyeung Is there a way to avoid this error message when the services are going down?

judyjoseph commented 1 year ago

@adyeung to take a look, thanks

adyeung commented 1 year ago

@sg893052 please help check and handle the sigterm on reboot, this is a day one issue according to submitter

dgsudharsan commented 1 year ago

@sg893052 @adyeung Do we have an update?

sg893052 commented 1 year ago

ETA: 11/3

liat-grozovik commented 1 year ago

any update on this issue?

sg893052 commented 1 year ago

I prioritized the delivery of the TPCM feature over the bug, and now I'm focusing on addressing the bug.

sg893052 commented 12 months ago

RC found. Will push the fix by 12/12

sg893052 commented 12 months ago

Fix PR: https://github.com/sonic-net/sonic-buildimage/pull/17459 This shall be backported to prior releases.