sdn-sense / siterm

Apache License 2.0
3 stars 5 forks source link

Need to add retry policy for activate errors (ansible timeouts increase) #548

Closed juztas closed 4 months ago

juztas commented 4 months ago

In case of multiple apply in parallel - there might be an issue connecting to device and fail to apply. Need to add logic for retries. One for sure is to add timeouts (and increase them up to 2mins).

Fri, 07 Jun 2024 11:42:31.063 - ProvisioningService - INFO - {"uuid": "f47a5301-5fc1-4825-b344-08de77f4630e", "counter": 14, "stdout": "An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.module_utils.connection.ConnectionError: ssh connection failed: ssh connect failed: Socket error: disconnected\r\nfatal: [dellos9_s0]: FAILED! => {\"msg\": \"Unexpected failure during module execution: ssh connection failed: ssh connect failed: Socket error: disconnected\", \"stdout\": \"\"}", "start_line": 10, "end_line": 12, "runner_ident": "ce48b0ad-416b-4645-bf9d-be15d1f9599b", "event": "runner_on_failed", "pid": 115182, "created": "2024-06-07T11:42:14.830248+00:00", "parent_uuid": "ea4ba553-4c49-ccaf-35a3-000000000006", "event_data": {"playbook": "applyconfig.yaml", "playbook_uuid": "19fc116e-aaac-4977-ab04-d48b03eb8878", "play": "Apply Vlan Configuration templates", "play_uuid": "ea4ba553-4c49-ccaf-35a3-000000000003", "play_pattern": "all", "task": "Push Dell OS 9 Config", "task_uuid": "ea4ba553-4c49-ccaf-35a3-000000000006", "task_action": "sense.dellos9.dellos9_config", "resolved_action": "sense.dellos9.dellos9", "task_args": "", "task_path": "/opt/siterm/config/ansible/sense/project/applyconfig.yaml:13", "host": "dellos9_s0", "remote_addr": "dellos9_s0", "res": {"msg": "Unexpected failure during module execution: ssh connection failed: ssh connect failed: Socket error: disconnected", "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.9/site-packages/ansible/executor/task_executor.py\", line 165, in run\n    res = self._execute()\n  File \"/usr/local/lib/python3.9/site-packages/ansible/executor/task_executor.py\", line 656, in _execute\n    result = self._handler.run(task_vars=vars_copy)\n  File \"/root/.ansible/collections/ansible_collections/sense/dellos9/plugins/module_utils/runwrapper.py\", line 37, in wrapper\n    result = func(*args, **kwargs)\n  File \"/root/.ansible/collections/ansible_collections/sense/dellos9/plugins/action/dellos9.py\", line 88, in run\n    out = conn.get_prompt()\n  File \"/usr/local/lib/python3.9/site-packages/ansible/module_utils/connection.py\", line 200, in __rpc__\n    raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code)\nansible.module_utils.connection.ConnectionError: ssh connection failed: ssh connect failed: Socket error: disconnected\n", "stdout": "", "_ansible_no_log": false}, "start": "2024-06-07T11:42:14.228892+00:00", "end": "2024-06-07T11:42:14.830117+00:00", "duration": 0.601225, "ignore_errors": null, "event_loop": null, "uuid": "f47a5301-5fc1-4825-b344-08de77f4630e"}}
juztas commented 4 months ago

Added retries here: https://github.com/sdn-sense/siterm/pull/549 Docker build increases: https://github.com/sdn-sense/siterm-startup/commit/470801d51a66a8cfb495d4b30378d97e2f7229bd