sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
722 stars 1.38k forks source link

[ZTP] configdb-json plugin fails to apply config_db.json. Reproduced intermittently. #5378

Open MaxYaremchuk opened 3 years ago

MaxYaremchuk commented 3 years ago

Description From time to time execution of built-in configdb-json plugin fails with exit code(1). It doesn't depends on whether ZTP process is running just after cold rebooting or not. The following run of ZTP process may finish successfully (no changes on config_db.json or ztp_config.json have applied) This issue is reproduced intermittently (approximately one time out of ten)

Steps to reproduce the issue 1.Boot sonic with ZTP enabled feature 2.Announce DHCP option 67 pointing to ztp_config.json (see at the bottom) 3.Put all necessary plugins and configuration files using in ZTP process on WEB server.

  1. Start ZTP process
    ztp erase
    ztp run

    5.Wait till ZTP process has finished. 6.Verify the status of ZTP execution: 7.sudo show ztp status --verbose 8.Check /var/log/ztp.log

Describe the results you received Example of unsuccessful execution of configdb-json plugin:

Sep  2 12:44:40.370798 sonic-step-04 ERR sonic-ztp[22682]: configdb-json: Command '/usr/bin/config reload -y /tmp/config_dl.json' failed with exit code(1).
Sep  2 12:44:40.417397 sonic-step-04 DEBUG sonic-ztp[9525]: Plugin /usr/lib/ztp/plugins/configdb-json /var/lib/ztp/sections/04-configdb-json/input.json exit code = 1.
Sep  2 12:44:40.418084 sonic-step-04 INFO sonic-ztp[9525]: Processed Configuration section 04-configdb-json with result FAILED, exit code (1) at 2020-09-02 12:42:56 UTC.

Next attempt of execution (no changes on config_db.json or ztp_config.json have applied between executions ) was successful:

Sep  2 13:14:40.290149 sonic-step-04 DEBUG sonic-ztp[3039]: Plugin /usr/lib/ztp/plugins/configdb-json /var/lib/ztp/sections/04-configdb-json/input.json exit code = 0.
Sep  2 13:14:40.291848 sonic-step-04 INFO sonic-ztp[3039]: Processed Configuration section 04-configdb-json with result SUCCESS, exit code (0) at 2020-09-02 13:12:56 UTC.

Describe the results you expected

Output of show version

SONiC Software Version: SONiC.HEAD.0-fffee7e3
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: fffee7e3
Build date: Sun Jun 21 09:33:14 UTC 2020
Built by: ezrada@r-build-sonic02

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
Serial Number: MT1646X05282            
Uptime: 13:36:03 up 12 min,  3 users,  load average: 0.41, 0.49, 0.50 

sonic_dump_DUT-1_20200902_133602.tar.gz

files.zip

rajendra-dendukuri commented 3 years ago

The issue is due to sflow service start failing not starting as part of the "config reload".

Sep  2 12:44:40.228634 sonic-step-04 INFO sonic-ztp[9524]: Executing restart of service sflow...
Sep  2 12:44:40.243347 sonic-step-04 WARNING systemd[1]: sflow.service: Start request repeated too quickly.
Sep  2 12:44:40.244186 sonic-step-04 ERR systemd[1]: Failed to start sFlow container.
Sep  2 12:44:40.244876 sonic-step-04 NOTICE systemd[1]: sflow.service: Unit entered failed state.
Sep  2 12:44:40.245806 sonic-step-04 WARNING systemd[1]: sflow.service: Failed with result 'start-limit-hit'.
Sep  2 12:44:40.247692 sonic-step-04 INFO sonic-ztp[9524]: Job for sflow.service failed.
Sep  2 12:44:40.248548 sonic-step-04 INFO sonic-ztp[9524]: See "systemctl status sflow.service" and "journalctl -xe" for details.
Sep  2 12:44:40.249335 sonic-step-04 ERR config: Failed to execute restart of service sflow with error 1
Sep  2 12:44:40.370798 sonic-step-04 ERR sonic-ztp[22682]: configdb-json: Command '/usr/bin/config reload -y /tmp/config_dl.json' failed with exit code(1).

I propose that we can provide an additional option to specify in the configdb-json plugin to ignore errors seen during "config reload". This option will be disabled by default to catch errors seen during "config reload".

Please raise another issue for the sflow service issue seen.

vivekrnv commented 3 years ago

config reload failed again recently when the ZTP tries to apply configdb-json plugin. Not because of Sflow though.

sonic_dump_DUT-1_20210419_130556.tar.gz

Apr 19 12:53:54.947222 sonic INFO sonic-ztp[15343]: Stopping SONiC target ...
..........
..........
Apr 19 12:53:55.153250 sonic INFO sonic-ztp[15374]: Job for sonic.target canceled.
Apr 19 12:53:55.729680 sonic ERR sonic-ztp[13308]: configdb-json: Command 'config reload -y /tmp/config_dl.json' failed with exit code(1).