Open pro-akim opened 1 month ago
Running the reported workflow file, the workflow failed reporting this error:
[2024-04-11 11:34:22] [ERROR] [57744] [ThreadPoolExecutor-0_0] [workflow_engine]: [run-agent-linux-ubuntu-18.04-amd64-tests] Task failed with error: Error executing process task Traceback (most recent call last):
File "/home/marcelo/wazuh/wazuh-qa/deployability/modules/testing/main.py", line 30, in <module>
Tester.run(InputPayload(**vars(parse_arguments())))
File "/home/marcelo/wazuh/wazuh-qa/deployability/modules/testing/testing.py", line 53, in run
extra_vars['current_user'] = os.getlogin()
OSError: [Errno 6] No such device or address
I reproduced the problem using python:
python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.getlogin())
Traceback (most recent call last):
File "<stdin>", line 1, in `<module>`
OSError: [Errno 6] No such device or address
>>> import getpass
>>> getpass.getuser()
'marcelo'
The os.getlogin()
returns the name of the user logged in on the controlling terminal of the process. Typically processes in the user's session (tty, X session) have a controlling terminal. Processes spawned by the workflow, do not have a controlling terminal. The recommended way to obtain the current user is by using getpass.getuser()
After replacing the os.getlogin
function by getpass.getuser
, I've rerun the workflow file. This time, the workflow did not raise the exception and got stuck executing the test for the ubuntu-22.04 agent
The workflow log file shows an authentication error. The virtual machine was hanging, but the workflow did not throw an exception. After pressing CTRL-c
, the workflow aborted the task and continued with the following.
2024-04-11 12:35:14,322] [ERROR] [Testing]: Authentication error. Check SSH credentials in ubuntu-22.04
[2024-04-11 13:50:44,198] [ERROR] [81388] [MainThread] [workflow_engine]: User interrupt detected. End process...
I modified the workflow file and changed the vagrant provider for AWS. I could not reproduce the issue reported by @pro-akim.
Note that I didn't modify the provision module; I didn't add a delay at the start of the Provision.run
method.
I've modified the original vagrant test, keeping only two agents in the agent list. I've also turned off the cleanup section to keep the VMs running after finishing the workflow execution.
variables:
agent-os:
- linux-ubuntu-18.04-amd64
- linux-ubuntu-20.04-amd64
manager-os: linux-ubuntu-22.04-amd64
infra-provider: vagrant
working-dir: /tmp/dtt1-poc
I've reproduced the error reported by @pro-akim. In this workflow.log file, this message shows the provisioning error:
[2024-04-12 12:52:51,761] [INFO] [Testing]: Getting status of ubuntu-22.04
[2024-04-12 12:52:52,127] [ERROR] [Testing]: agent-linux-ubuntu-2204-amd64 is not present in agent_control information
[2024-04-12 12:52:52,680] [DEBUG] ANSIBLE: Playbook [{'hosts': 'localhost', 'become': True, 'become_user': 'marcelo', 'tasks': [{'name': 'Test restart for agent', 'command': "python3 -m pytest modules/testing/tests/test_agent/test_restart.py -v --wazuh_version=4.7.3 --wazuh_revision=40714 --component=agent --dependencies='{}' --targets='{wazuh-1: /tmp/dtt1-poc/manager-linux-ubuntu-22.04-amd64/inventory.yaml, agent: /tmp/dtt1-poc/agent-linux-
These entries found in the wazuh manager's ossec.log file show the problem:
2024/04/12 17:04:16 wazuh-authd: ERROR: Invalid agent name ubuntu-jammy (same as manager)
2024/04/12 17:05:16 wazuh-authd: INFO: New connection from 192.168.57.4
2024/04/12 17:05:16 wazuh-authd: INFO: Received request for a new agent (ubuntu-jammy) from: 192.168.57.4
2024/04/12 17:05:16 wazuh-authd: ERROR: Invalid agent name ubuntu-jammy (same as manager)
2024/04/12 17:06:16 wazuh-authd: INFO: New connection from 192.168.57.4
2024/04/12 17:06:16 wazuh-authd: INFO: Received request for a new agent (ubuntu-jammy) from: 192.168.57.4
2024/04/12 17:06:16 wazuh-authd: ERROR: Invalid agent name ubuntu-jammy (same as manager)
2024/04/12 17:07:16 wazuh-authd: INFO: New connection from 192.168.57.4
2024/04/12 17:07:16 wazuh-authd: INFO: Received request for a new agent (ubuntu-jammy) from: 192.168.57.4
2024/04/12 17:07:16 wazuh-authd: ERROR: Invalid agent name ubuntu-jammy (same as manager)
The provision fails because the manager and the agent have the same hostname. The hostname assigned by the allocator is the default hostname of the VM's image. This assignment duplicates hostnames and must be avoided.
It would be useful to have knowledge of the criteria that the @wazuh/devel-devops team will use to apply the same nomenclature in the test module when establishing the naming of the agents
@mhamra please change the status to blocked until https://github.com/wazuh/wazuh-qa/issues/5214 is completed
@fcaffieri After talking with @davidjiglesias, we will move this issue from High impact bug to DTT2 as a bug (as it depends on the DevOps issue)
Same Os in agent and manager generates instability in the agent supply.
Sometimes the agent is not installed directly, sometimes it is installed but does not connect to the manager
Running this test (https://github.com/wazuh/wazuh-qa/issues/5125#issuecomment-2042784252)
YAML file
``` version: 0.1 description: This workflow is used to test agents deployment por DDT1 PoC variables: agent-os: - linux-ubuntu-18.04-amd64 - linux-ubuntu-20.04-amd64 - linux-ubuntu-22.04-amd64 - linux-debian-10-amd64 - linux-debian-11-amd64 - linux-debian-12-amd64 - linux-oracle-9-amd64 manager-os: linux-ubuntu-22.04-amd64 infra-provider: vagrant working-dir: /tmp/dtt1-poc tasks: # Unique manager allocate task - task: "allocate-manager-{manager-os}" description: "Allocate resources for the manager." do: this: process with: path: python3 args: - modules/allocation/main.py - action: create - provider: "{infra-provider}" - size: large - composite-name: "{manager-os}" - inventory-output: "{working-dir}/manager-{manager-os}/inventory.yaml" - track-output: "{working-dir}/manager-{manager-os}/track.yaml" cleanup: this: process with: path: python3 args: - modules/allocation/main.py - action: delete - track-output: "{working-dir}/manager-{manager-os}/track.yaml" # Unique agent allocate task - task: "allocate-agent-{agent}" description: "Allocate resources for the agent." do: this: process with: path: python3 args: - modules/allocation/main.py - action: create - provider: "{infra-provider}" - size: small - composite-name: "{agent}" - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml" - track-output: "{working-dir}/agent-{agent}/track.yaml" foreach: - variable: agent-os as: agent cleanup: this: process with: path: python3 args: - modules/allocation/main.py - action: delete - track-output: "{working-dir}/agent-{agent}/track.yaml" # Unique manager provision task - task: "provision-manager-{manager-os}" description: "Provision the manager." do: this: process with: path: python3 args: - modules/provision/main.py - inventory: "{working-dir}/manager-{manager-os}/inventory.yaml" - install: - component: wazuh-manager type: assistant version: 4.7.3 live: True depends-on: - "allocate-manager-{manager-os}" # Generic agent provision task - task: "provision-install-{agent}" description: "Provision resources for the {agent} agent." do: this: process with: path: python3 args: - modules/provision/main.py - inventory: "{working-dir}/agent-{agent}/inventory.yaml" - dependencies: - manager: "{working-dir}/manager-{manager-os}/inventory.yaml" - install: - component: wazuh-agent type: package version: 4.7.3 live: True depends-on: - "allocate-agent-{agent}" - "provision-manager-{manager-os}" foreach: - variable: agent-os as: agent # Generic agent test task - task: "run-agent-{agent}-tests" description: "Run tests install for the agent {agent}." do: this: process with: path: python3 args: - modules/testing/main.py - targets: - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml" - agent: "{working-dir}/agent-{agent}/inventory.yaml" - tests: "restart" - component: "agent" - wazuh-version: "4.7.3" - wazuh-revision: "40714" - live: "True" foreach: - variable: agent-os as: agent depends-on: - "provision-install-{agent}" ```The following error was found
This error happens because when the agent is not installed (absence of client.key), the test module takes the name of the os and removes the "." and uses this for the agent's name in the validation.
On the other hand
Running the test by using (https://github.com/wazuh/wazuh-qa/issues/5125#issuecomment-2042784252)
YAML file
``` version: 0.1 description: This workflow is used to test agents deployment por DDT1 PoC variables: agent-os: - linux-ubuntu-18.04-amd64 - linux-ubuntu-20.04-amd64 - linux-ubuntu-22.04-amd64 - linux-debian-10-amd64 - linux-debian-11-amd64 - linux-debian-12-amd64 - linux-oracle-9-amd64 manager-os: linux-ubuntu-22.04-amd64 infra-provider: vagrant working-dir: /tmp/dtt1-poc tasks: # Unique manager allocate task - task: "allocate-manager-{manager-os}" description: "Allocate resources for the manager." do: this: process with: path: python3 args: - modules/allocation/main.py - action: create - provider: "{infra-provider}" - size: large - composite-name: "{manager-os}" - inventory-output: "{working-dir}/manager-{manager-os}/inventory.yaml" - track-output: "{working-dir}/manager-{manager-os}/track.yaml" cleanup: this: process with: path: python3 args: - modules/allocation/main.py - action: delete - track-output: "{working-dir}/manager-{manager-os}/track.yaml" # Unique agent allocate task - task: "allocate-agent-{agent}" description: "Allocate resources for the agent." do: this: process with: path: python3 args: - modules/allocation/main.py - action: create - provider: "{infra-provider}" - size: small - composite-name: "{agent}" - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml" - track-output: "{working-dir}/agent-{agent}/track.yaml" foreach: - variable: agent-os as: agent cleanup: this: process with: path: python3 args: - modules/allocation/main.py - action: delete - track-output: "{working-dir}/agent-{agent}/track.yaml" # Unique manager provision task - task: "provision-manager-{manager-os}" description: "Provision the manager." do: this: process with: path: python3 args: - modules/provision/main.py - inventory: "{working-dir}/manager-{manager-os}/inventory.yaml" - install: - component: wazuh-manager type: assistant version: 4.7.3 live: True depends-on: - "allocate-manager-{manager-os}" # Generic agent provision task - task: "provision-install-{agent}" description: "Provision resources for the {agent} agent." do: this: process with: path: python3 args: - modules/provision/main.py - inventory: "{working-dir}/agent-{agent}/inventory.yaml" - dependencies: - manager: "{working-dir}/manager-{manager-os}/inventory.yaml" - install: - component: wazuh-agent type: package version: 4.7.3 live: True depends-on: - "allocate-agent-{agent}" - "provision-manager-{manager-os}" foreach: - variable: agent-os as: agent # Generic agent test task - task: "run-agent-{agent}-tests" description: "Run tests install for the agent {agent}." do: this: process with: path: python3 args: - modules/testing/main.py - targets: - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml" - agent: "{working-dir}/agent-{agent}/inventory.yaml" - tests: "uninstall" - component: "agent" - wazuh-version: "4.7.3" - wazuh-revision: "40714" - live: "True" foreach: - variable: agent-os as: agent depends-on: - "provision-install-{agent}" ```It was possible fo find
The agent was installed, but it was not connected to the manager.
This instability can happen due to naming conflicts in the WF/Provision or Wazuh while 2 hosts have the same name.
Further research should be done