sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
196 stars 716 forks source link

test_reload_config.py test failure due to interface-config.service dependency #11654

Open anamehra opened 8 months ago

anamehra commented 8 months ago

Description test_reload_config test case has a dependency on interfaces-config.service. In a test to validate config reload failure when swss is not up, the test checks for interfaces-config.service state (https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/test_reload_config.py#L129) to be up to test config reload, expecting swss is not up at that time. This change was introduced by https://github.com/sonic-net/sonic-mgmt/pull/7289 This code assumes that the swss service has a dependency on interface-config.service and starts after the interface-config, which was the case earlier. This dependency was removed a year ago via https://github.com/sonic-net/sonic-buildimage/pull/13084

There is no direct dependency between interfaces-config and swss, so systemd won't guarantee swss service starts only after interfaces-config.

Steps to reproduce the issue: 1. 2. 3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
anamehra commented 8 months ago

Hi @abdosi , @JibinBao , @stepanblyschak , for your viz. Thanks

stepanblyschak commented 8 months ago

Hi @anamehra, I am looking at the test code, not sure I understand why test expects swss.service to depend on interfaces-config.service, also, why "config reload" should throw an error?

anamehra commented 8 months ago

Hi @stepanblyschak , the test code validates a negative scenario where config reload should fail when swss is not up yet. It relies on the timing to run the command right at the moment when redis is accessible but swss has not started yet and I think interfaces-config check was introduced just to achieve ive that. Given that swss has no dependency on interfaces-config now, the test case needs to change. I think instead of validating this negative scenario during boot, it should just explicitly stop swss and validate that config reload fails.

stepanblyschak commented 8 months ago

@anamehra Thanks, so as I understand "config reload" should fail when swss is down, any reason why? "config reload" cold restarts all services, so not sure such check in "config reload" is needed or are there any issues restarting services when swss is down?

JibinBao commented 8 months ago

@dgsudharsan, Can you help check the above issue? If there is no direct dependency between interfaces-config and swss, shoud we revert the fix https://github.com/sonic-net/sonic-mgmt/pull/7289? And then how to stabilize the test_reload_configuration_checks?

dgsudharsan commented 8 months ago

@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example

Jan  4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration...
Jan  4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None
Jan  4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.

You will get

File "/usr/local/...: Unable to connect to redis: Cannot assign requested address

However this issue was fixed by @saiarcot895 https://github.com/sonic-net/sonic-buildimage/pull/15080. Given we have this fix the change in sonic-mgmt can be reverted.

@JibinBao please revert the change to check interface-services.config.

JibinBao commented 8 months ago

Fixed: https://github.com/sonic-net/sonic-mgmt/pull/11674 https://github.com/sonic-net/sonic-mgmt/pull/11675

abdosi commented 6 months ago

@anamehra : Please check if below PR fixes this issue.

Fixed: https://github.com/sonic-net/sonic-mgmt/pull/11674 https://github.com/sonic-net/sonic-mgmt/pull/11675

anamehra commented 5 months ago

The original issue is still not fixed. interfcae-config.service dependency is removed but now we see the original "redis connection failure" error if the config reload is done quickly after database check passes.

The config reload check after database up and before swss start is totally timing based here. If we want to validate config reload to fail for swss down, we can have a test case to stop swss and check the config reload. We are relying on swss state, so the test case should be designed to make sure we have the setup in condition which we want to test.

@bmridul , @vperumal , for your viz

bmridul commented 5 months ago

Log for the above..

10/05/2024 20:43:12 utilities.wait_until                     L0146 DEBUG  | check_database_status is True, exit early with True
10/05/2024 20:43:12 test_reload_config.test_reload_configura L0131 INFO   | Reload configuration check
10/05/2024 20:43:12 base._run                                L0067 DEBUG  | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell, args=["sudo config reload -y"], kwargs={"executable": "/bin/bash", "module_ignore_errors": true}
10/05/2024 20:43:14 base._run                                L0104 DEBUG  | /data/tests/common/devices/multi_asic.py::_run_on_asics#128: [croc4-aaa14-dut] AnsibleModule::shell Result => {"failed": true, "msg": "non-zero return code", "cmd": "sudo config reload -y", "stdout": "", "stderr": "Traceback (most recent call last):\n  File \"/usr/local/bin/config\", line 8, in <module>\n    sys.exit(config())\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__\n    return self.main(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main\n    rv = self.invoke(ctx)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke\n    Command.invoke(self, ctx)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke\n    return callback(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func\n    return f(get_current_context(), *args, **kwargs)\n  File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config\n    ctx.obj = Db()\n  File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__\n    self.cfgdb.connect()\n  File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect\n    return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)\nRuntimeError: Unable to connect to redis: Cannot assign requested address", "rc": 1, "start": "2024-05-10 20:46:50.911295", "end": "2024-05-10 20:46:52.270540", "delta": "0:00:01.359245", "changed": true, "invocation": {"module_args": {"executable": "/bin/bash", "_raw_params": "sudo config reload -y", "_uses_shell": true, "warn": true, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "creates": null, "removes": null, "stdin": null}}, "warnings": ["Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], "stdout_lines": [], "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/local/bin/config\", line 8, in <module>", "    sys.exit(config())", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 764, in __call__", "    return self.main(*args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 717, in main", "    rv = self.invoke(ctx)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 1134, in invoke", "    Command.invoke(self, ctx)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 956, in invoke", "    return ctx.invoke(self.callback, **ctx.params)", "  File \"/usr/local/lib/python3.9/dist-packages/click/core.py\", line 555, in invoke", "    return callback(*args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/click/decorators.py\", line 17, in new_func", "    return f(get_current_context(), *args, **kwargs)", "  File \"/usr/local/lib/python3.9/dist-packages/config/main.py\", line 1186, in config", "    ctx.obj = Db()", "  File \"/usr/local/lib/python3.9/dist-packages/utilities_common/db.py\", line 12, in __init__", "    self.cfgdb.connect()", "  File \"/usr/lib/python3/dist-packages/swsscommon/swsscommon.py\", line 2301, in connect", "    return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)", "RuntimeError: Unable to connect to redis: Cannot assign requested address"], "_ansible_no_log": false}
10/05/2024 20:43:14 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):
anamehra commented 4 months ago

@anamehra This check was introduced to overcome a race condition where config reload gets executed while interface-services.config is running and the loopback ip is not accessible giving traceback for the command the resulting in the test failure. Below is example

Jan  4 06:29:19.434440 r-bulldog-02 INFO systemd[1]: Starting Update interfaces configuration...
Jan  4 06:29:21.659292 r-bulldog-02 INFO python[11575]: ansible-command Invoked with executable=/bin/bash _uses_shell=True _raw_params=sudo config reload -y warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None creates=None removes=None stdin=None
Jan  4 06:29:25.786037 r-bulldog-02 INFO systemd[1]: Finished Update interfaces configuration.

You will get

File "/usr/local/...: Unable to connect to redis: Cannot assign requested address

However this issue was fixed by @saiarcot895 sonic-net/sonic-buildimage#15080. Given we have this fix the change in sonic-mgmt can be reverted.

@JibinBao please revert the change to check interface-services.config.

HI @dgsudharsan , looks like the issue is not completely fixed. We still observe the error if config reload is done with few secs of database service active. The test_reload_config.py is still failing intermittently due to this. "RuntimeError: Sonic database config file doesn't exist at /var/run/redis/sonic-db/database_config.json"