Open anamehra opened 8 months ago
Hi @abdosi , @JibinBao , @stepanblyschak , for your viz. Thanks
Hi @anamehra, I am looking at the test code, not sure I understand why test expects swss.service to depend on interfaces-config.service, also, why "config reload" should throw an error?
Hi @stepanblyschak , the test code validates a negative scenario where config reload should fail when swss is not up yet. It relies on the timing to run the command right at the moment when redis is accessible but swss has not started yet and I think interfaces-config check was introduced just to achieve ive that. Given that swss has no dependency on interfaces-config now, the test case needs to change. I think instead of validating this negative scenario during boot, it should just explicitly stop swss and validate that config reload fails.
@anamehra Thanks, so as I understand "config reload" should fail when swss is down, any reason why? "config reload" cold restarts all services, so not sure such check in "config reload" is needed or are there any issues restarting services when swss is down?
@dgsudharsan, Can you help check the above issue? If there is no direct dependency between interfaces-config and swss, shoud we revert the fix https://github.com/sonic-net/sonic-mgmt/pull/7289? And then how to stabilize the test_reload_configuration_checks?
@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example
Jan 4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration...
Jan 4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None
Jan 4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.
You will get
File "/usr/local/...: Unable to connect to redis: Cannot assign requested address
However this issue was fixed by @saiarcot895 https://github.com/sonic-net/sonic-buildimage/pull/15080. Given we have this fix the change in sonic-mgmt can be reverted.
@JibinBao please revert the change to check interface-services.config.
@anamehra : Please check if below PR fixes this issue.
Fixed: https://github.com/sonic-net/sonic-mgmt/pull/11674 https://github.com/sonic-net/sonic-mgmt/pull/11675
The original issue is still not fixed. interfcae-config.service dependency is removed but now we see the original "redis connection failure" error if the config reload is done quickly after database check passes.
The config reload check after database up and before swss start is totally timing based here. If we want to validate config reload to fail for swss down, we can have a test case to stop swss and check the config reload. We are relying on swss state, so the test case should be designed to make sure we have the setup in condition which we want to test.
@bmridul , @vperumal , for your viz
Log for the above..
10/05/2024 20:43:12 utilities.wait_until L0146 DEBUG | check_database_status is True, exit early with True
10/05/2024 20:43:12 test_reload_config.test_reload_configura L0131 INFO | Reload configuration check
10/05/2024 20:43:12 base._run L0067 DEBUG | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell, args=["sudo config reload -y"], kwargs={"executable": "/bin/bash", "module_ignore_errors": true}
10/05/2024 20:43:14 base._run L0104 DEBUG | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell Result => {"failed": true, "msg": "non-zero return code", "cmd": "sudo config reload -y", "stdout": "", "stderr": "Traceback (most recent call last):\n File \"/usr/local/bin/config\", line 8, in <module>\n sys.exit(config())\n File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__\n return self.main(*args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main\n rv = self.invoke(ctx)\n File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke\n Command.invoke(self, ctx)\n File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke\n return callback(*args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func\n return f(get_current_context(), *args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config\n ctx.obj = Db()\n File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__\n self.cfgdb.connect()\n File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect\n return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)\nRuntimeError: Unable to connect to redis: Cannot assign requested address", "rc": 1, "start": "2024-05-10 20:46:50.911295", "end": "2024-05-10 20:46:52.270540", "delta": "0:00:01.359245", "changed": true, "invocation": {"module_args": {"executable": "/bin/bash", "_raw_params": "sudo config reload -y", "_uses_shell": true, "warn": true, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "creates": null, "removes": null, "stdin": null}}, "warnings": ["Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], "stdout_lines": [], "stderr_lines": ["Traceback (most recent call last):", " File \"/usr/local/bin/config\", line 8, in <module>", " sys.exit(config())", " File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__", " return self.main(*args, **kwargs)", " File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main", " rv = self.invoke(ctx)", " File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke", " Command.invoke(self, ctx)", " File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke", " return ctx.invoke(self.callback, **ctx.params)", " File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke", " return callback(*args, **kwargs)", " File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func", " return f(get_current_context(), *args, **kwargs)", " File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config", " ctx.obj = Db()", " File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__", " self.cfgdb.connect()", " File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect", " return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)", "RuntimeError: Unable to connect to redis: Cannot assign requested address"], "_ansible_no_log": false}
10/05/2024 20:43:14 __init__.pytest_runtest_call L0040 ERROR | Traceback (most recent call last):
@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example
Jan 4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration... Jan 4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None Jan 4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.
You will get
File "/usr/local/...: Unable to connect to redis: Cannot assign requested address
However this issue was fixed by @saiarcot895 sonic-net/sonic-buildimage#15080. Given we have this fix the change in sonic-mgmt can be reverted.
@JibinBao please revert the change to check interface-services.config.
HI @dgsudharsan , looks like the issue is not completely fixed. We still observe the error if config reload is done with few secs of database service active. The test_reload_config.py is still failing intermittently due to this. "RuntimeError: Sonic database config file doesn't exist at /var/run/redis/sonic-db/database_config.json"
Description test_reload_config test case has a dependency on interfaces-config.service. In a test to validate config reload failure when swss is not up, the test checks for interfaces-config.service state (https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/test_reload_config.py#L129) to be up to test config reload, expecting swss is not up at that time. This change was introduced by https://github.com/sonic-net/sonic-mgmt/pull/7289 This code assumes that the swss service has a dependency on interface-config.service and starts after the interface-config, which was the case earlier. This dependency was removed a year ago via https://github.com/sonic-net/sonic-buildimage/pull/13084
There is no direct dependency between interfaces-config and swss, so systemd won't guarantee swss service starts only after interfaces-config.
Steps to reproduce the issue: 1. 2. 3.
Describe the results you received:
Describe the results you expected:
Additional information you deem important: