sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

[ZTP] When ztp stops during the initialization of the system, errors are observed in syslog due to conflicting systemctl calls #15900

Open dgsudharsan opened 1 year ago

dgsudharsan commented 1 year ago

Description

When ZTP is disabled, it stops the services. However if this happens during hostcfg initialization phase, hostcfgd will throw errors like below due to conflicting systemctl commands

Jul 12 19:29:14.620782 sonic WARNING sonic-ztp[6886]: Received terminate signal. Shutting down.
Jul 12 19:29:14.621182 sonic INFO systemd[1]: Stopping SONiC Zero Touch Provisioning service...
Jul 12 19:29:15.623786 sonic INFO sonic-ztp[6886]: Process pid 8364 returned with status 15.
Jul 12 19:29:15.648648 sonic INFO systemd[1]: ztp.service: Succeeded.
Jul 12 19:29:15.649713 sonic INFO systemd[1]: Stopped SONiC Zero Touch Provisioning service.
Jul 12 19:31:01.864980 r-tigris-13 ERR monit[18308]: Unix socket /var/run/monit.sock connection error -- No such file or directory
Jul 12 19:31:01.885149 r-tigris-13 INFO hostcfgd[18272]: Job for lldp.service canceled.
Jul 12 19:31:01.887399 r-tigris-13 ERR hostcfgd: ['sudo', 'systemctl', 'start', 'lldp.service'] - failed: return code - 1, output:#012None
Jul 12 19:31:01.887509 r-tigris-13 ERR hostcfgd: Feature 'lldp.service' failed to be enabled and started
Jul 12 19:31:01.894629 r-tigris-13 ERR hostcfgd: Failed to get status of mgmt-framework.service: rc=-15 stderr=b''
Jul 12 19:31:01.895072 r-tigris-13 INFO systemd[1]: Stopping Host config enforcer daemon...
Jul 12 19:31:03.017152 r-tigris-13 NOTICE systemd[1]: Requested transaction contradicts existing jobs: Transaction for mgmt-framework.service/start is destructive (hostcfgd.timer has 'stop' job queued, but 'start' is included in transaction).
Jul 12 19:31:03.017634 r-tigris-13 INFO hostcfgd[18523]: Failed to start mgmt-framework.service: Transaction for mgmt-framework.service/start is destructive (hostcfgd.timer has 'stop' job queued, but 'start' is included in transaction).
Jul 12 19:31:03.017710 r-tigris-13 INFO hostcfgd[18523]: See system logs and 'systemctl status mgmt-framework.service' for details.
Jul 12 19:31:03.018793 r-tigris-13 ERR hostcfgd: ['sudo', 'systemctl', 'start', 'mgmt-framework.service'] - failed: return code - 4, output:#012None
Jul 12 19:31:03.018859 r-tigris-13 ERR hostcfgd: Feature 'mgmt-framework.service' failed to be enabled and started
Jul 12 19:31:06.435905 r-tigris-13 INFO pmon#supervisord 2023-07-12 19:31:06,435 INFO stopped: psud (exit status 143)
Jul 12 19:31:06.542949 r-tigris-13 INFO bgp#supervisord 2023-07-12 19:31:06,542 INFO waiting for supervisor-proc-exit-listener, rsyslogd, staticd, zebra, bgpd, bgpcfgd to die
Jul 12 19:31:06.573064 r-tigris-13 WARNING systemd[1]: hostcfgd.service: Main process exited, code=killed, status=6/ABRT
Jul 12 19:31:06.573308 r-tigris-13 WARNING systemd[1]: hostcfgd.service: Failed with result 'signal'.

Steps to reproduce the issue:

  1. Perform ONIE installation of the system
  2. Stop ZTP before system initializes
  3. Observer errors in syslog

Describe the results you received:

Errors are observed in syslog

Describe the results you expected:

No errors in syslog

Output of show version:

SONiC Software Version: SONiC.202211_1_RC2.25-23bbcd5d9_Internal
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: 23bbcd5d9
Build date: Wed Jul 12 11:08:25 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn3800-r0
HwSKU: Mellanox-SN3800-D112C8
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1937X00527
Model Number: MSN3800-CS2FO
Hardware Revision: A2
Uptime: 22:45:40 up 8 min,  2 users,  load average: 0.79, 3.12, 2.52
Date: Wed 12 Jul 2023 22:45:40

Docker images:
REPOSITORY                                         TAG                                  IMAGE ID       SIZE
docker-syncd-mlnx                                  202211_1_RC2.25-23bbcd5d9_Internal   18c6dda81b71   964MB
docker-syncd-mlnx                                  latest                               18c6dda81b71   964MB
docker-platform-monitor                            202211_1_RC2.25-23bbcd5d9_Internal   42dacc498c20   963MB
docker-platform-monitor                            latest                               42dacc498c20   963MB
docker-dhcp-relay                                  latest                               3f95b0989ff9   452MB
docker-macsec                                      latest                               b5ad5444d8ca   461MB
docker-eventd                                      202211_1_RC2.25-23bbcd5d9_Internal   3b752987a3e2   443MB
docker-eventd                                      latest                               3b752987a3e2   443MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.5.3-202211-13                      14978ef516bc   432MB
docker-orchagent                                   202211_1_RC2.25-23bbcd5d9_Internal   391a1fb0f42d   475MB
docker-orchagent                                   latest                               391a1fb0f42d   475MB
docker-teamd                                       202211_1_RC2.25-23bbcd5d9_Internal   9444552cf357   456MB
docker-teamd                                       latest                               9444552cf357   456MB
docker-snmp                                        202211_1_RC2.25-23bbcd5d9_Internal   a57c03b18b12   484MB
docker-snmp                                        latest                               a57c03b18b12   484MB
docker-fpm-frr                                     202211_1_RC2.25-23bbcd5d9_Internal   3d7ac3d0e139   485MB
docker-fpm-frr                                     latest                               3d7ac3d0e139   485MB
docker-sonic-telemetry                             202211_1_RC2.25-23bbcd5d9_Internal   1436f5da47ae   737MB
docker-sonic-telemetry                             latest                               1436f5da47ae   737MB
docker-sonic-p4rt                                  202211_1_RC2.25-23bbcd5d9_Internal   91fba50cc4c1   521MB
docker-sonic-p4rt                                  latest                               91fba50cc4c1   521MB
docker-router-advertiser                           202211_1_RC2.25-23bbcd5d9_Internal   2e67e4d49612   439MB
docker-router-advertiser                           latest                               2e67e4d49612   439MB
docker-lldp                                        202211_1_RC2.25-23bbcd5d9_Internal   fa3b42549e01   481MB
docker-lldp                                        latest                               fa3b42549e01   481MB
docker-mux                                         202211_1_RC2.25-23bbcd5d9_Internal   b4533bd21c95   488MB
docker-mux                                         latest                               b4533bd21c95   488MB
docker-database                                    202211_1_RC2.25-23bbcd5d9_Internal   72a70834b3ea   439MB
docker-database                                    latest                               72a70834b3ea   439MB
docker-sonic-mgmt-framework                        202211_1_RC2.25-23bbcd5d9_Internal   43dce34a9550   552MB
docker-sonic-mgmt-framework                        latest                               43dce34a9550   552MB
docker-sflow                                       202211_1_RC2.25-23bbcd5d9_Internal   1c6b54e53b51   422MB
docker-sflow                                       latest                               1c6b54e53b51   422MB
docker-nat                                         202211_1_RC2.25-23bbcd5d9_Internal   bafaa375d923   424MB
docker-nat                                         latest                               bafaa375d923   424MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/doai        1.0.0-master-internal-25             475e4a384e19   201MB

Output of show techsupport:

(paste your output here or download and attach the file here )

sysdump_test_check_errors_in_log_during_deploy_sonic_image.tar.gz

Additional information you deem important (e.g. issue happens only occasionally):

rajendra-dendukuri commented 1 year ago

@dgsudharsan Can you check if this can be fixed in the "config reload" command. The ztp disable command creates the default configuration and issues "config reload -y -f" command. So the fix has to be done in the "config reload" command.