wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
63 stars 30 forks source link

DTT1 - Iteration 3 - Allocation module - Handling SSH connection errors #5347

Closed c-bordon closed 2 weeks ago

c-bordon commented 3 weeks ago

Improve error handling for SSH connection problems when executing remote command deployment for macStadium.

Some errors:

image

image

python3 deployability/modules/allocation/main.py --action create --provider vagrant --size large --composite-name macos-highsierra-10.13.6-amd64 --working-dir /tmp/allocatorvm --track-output /tmp/allocatorvm/track.yml --inventory-output /tmp/allocatorvm/inventory.yml --instance-name gha_8941573697_build
[2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi not available, cannot use any GSSAPIProxy protocols: No module named 'gssapi'
[2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi IOV extension not available: No module named 'gssapi'
[[20](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:21)24-05-03 15:52:35] [INFO] ALLOCATOR: Creating instance at /tmp/allocatorvm
[2024-05-03 15:52:35] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-03 15:52:41] [INFO] ALLOCATOR: Using the macStadium Intel server to deploy.
[2024-05-03 15:52:43] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-03 15:52:43] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-03 15:52:53] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
Error: 024-05-03 15:53:23] [ERROR] ALLOCATOR: Command failed: Connection reset by 10.10.0.249 port 22

[2024-05-03 15:53:23] [INFO] ALLOCATOR: Instance gha_8941573697_build created.
Error: 024-05-03 15:55:37] [ERROR] ALLOCATOR: Command failed: ssh: connect to host 10.10.0.249 port 22: Connection timed out

[2024-05-03 15:55:37] [INFO] ALLOCATOR: Instance gha_8941573697_build started.
Error: 024-05-03 15:55:39] [ERROR] ALLOCATOR: Command failed: sudo: /Users/jenkins/testing/gha_8941573697_build/vagrant_script.sh: command not found

Traceback (most recent call last):
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 39, in <module>
    main()
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 35, in main
    Allocator.run(InputPayload(**vars(parse_arguments())))
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 37, in run
    return cls.__create(payload)
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 63, in __create
    inventory = cls.__generate_inventory(instance, payload.inventory_output)
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 130, in __generate_inventory
    ssh_config = instance.ssh_connection_info()
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 142, in ssh_connection_info
    if not 'running' in self.status():
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 1[21](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:22), in status
    return self.__parse_vagrant_status(output)
  File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line [23](https://github.com/wazuh/wazuh-agent-packages/actions/runs/8941573697/job/24562205336#step:11:24)8, in __parse_vagrant_status
    lines = message.split('\n')
AttributeError: 'NoneType' object has no attribute 'split'
c-bordon commented 3 weeks ago

Test

A test is performed forcing the VPN to disconnect and generating a broken pipe, the script is recovered after reconnection and the deployment continues:

cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-07 16:15:49] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-07 16:15:55] [INFO] ALLOCATOR: macStadium server has less than 2 VMs running, deploying in this host.
[2024-05-07 16:15:55] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-07 16:15:58] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-07 16:16:02] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-07 16:16:02] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-07 16:16:05] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-07 16:16:10] [INFO] ALLOCATOR: Instance cbordon-test-ssh-5543 created.

[2024-05-07 16:18:12] [WARNING] ALLOCATOR: SSH connection error: client_loop: send disconnect: Broken pipe
. Retrying...

[2024-05-07 16:18:47] [INFO] ALLOCATOR: Instance cbordon-test-ssh-5543 started.
[2024-05-07 16:19:06] [INFO] ALLOCATOR: Inventory file generated at /tmp/wazuh-qa/cbordon-test-ssh-5543/inventory.yml
[2024-05-07 16:19:08] [INFO] ALLOCATOR: SSH connection successful.
[2024-05-07 16:19:18] [INFO] ALLOCATOR: Track file generated at /tmp/wazuh-qa/cbordon-test-ssh-5543/track.yml
c-bordon commented 3 weeks ago

Update report

A couple of changes are made to improve error handling in the SSH connection, here is a list of improvements that this branch includes:

cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-08 16:35:24] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-08 16:35:30] [INFO] ALLOCATOR: macStadium ARM server has less than 2 VMs running, deploying in this host.
[2024-05-08 16:35:30] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-08 16:35:33] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-08 16:35:35] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-08 16:35:35] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-08 16:35:39] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-08 16:35:50] [INFO] ALLOCATOR: Instance cbordon-test-ssh-1018 created.
[2024-05-08 16:41:18] [WARNING] ALLOCATOR: SSH connection error: . Retrying in 30 seconds...
[2024-05-08 16:44:03] [WARNING] ALLOCATOR: SSH connection error: [Errno 110] Connection timed out. Retrying in 30 seconds...
Traceback (most recent call last):                                 
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/utils.py", line 45, in remote_command
    ssh.connect(**ssh_parameters)
  File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349, in connect
    retry_on_signal(lambda: sock.connect(addr))
  File "/usr/lib/python3/dist-packages/paramiko/util.py", line 279, in retry_on_signal
    return function()
  File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349, in <lambda>
    retry_on_signal(lambda: sock.connect(addr))
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/main.py", line 39, in <module>
    main()
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/main.py", line 35, in main
    Allocator.run(InputPayload(**vars(parse_arguments())))
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/allocation.py", line 37, in run
    return cls.__create(payload)
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/allocation.py", line 60, in __create
    instance.start()
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/instance.py", line 69, in start
    self.__run_vagrant_command('up')
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/instance.py", line 221, in __run_vagrant_command
    output = VagrantUtils.remote_command(cmd, self.remote_host_parameters)
  File "/home/cbordon/Documents/wazuh/repositorios/wazuh-qa/deployability/modules/allocation/vagrant/utils.py", line 58, in remote_command
    raise ValueError(f"Remote command execution failed: {str(e)}")
ValueError: Remote command execution failed: [Errno 110] Connection timed out
cbordon@cbordon-MS-7C88:~/Documents/wazuh/repositorios/wazuh-qa$ python3 deployability/modules/allocation/main.py --provider vagrant --size small --instance-name cbordon-test-ssh --composite-name macos-sonoma-14.0-arm64
[2024-05-08 16:26:46] [INFO] ALLOCATOR: Creating instance at /tmp/wazuh-qa
[2024-05-08 16:26:51] [INFO] ALLOCATOR: macStadium ARM server has less than 2 VMs running, deploying in this host.
[2024-05-08 16:26:51] [DEBUG] ALLOCATOR: Checking if instance directory exists on remote host
[2024-05-08 16:26:54] [DEBUG] ALLOCATOR: Creating instance directory on remote host
[2024-05-08 16:26:57] [DEBUG] ALLOCATOR: No config provided. Generating from payload
[2024-05-08 16:26:57] [DEBUG] ALLOCATOR: Generating new key pair
[2024-05-08 16:27:00] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance.
[2024-05-08 16:27:11] [INFO] ALLOCATOR: Instance cbordon-test-ssh-7111 created.
[2024-05-08 16:32:16] [WARNING] ALLOCATOR: SSH connection error: . Retrying in 30 seconds...
[2024-05-08 16:32:49] [ERROR] PARAMIKO.TRANSPORT: Socket exception: Connection reset by peer (104)
[2024-05-08 16:32:51] [INFO] ALLOCATOR: Instance cbordon-test-ssh-7111 started.
[2024-05-08 16:33:05] [INFO] ALLOCATOR: Inventory file generated at /tmp/wazuh-qa/cbordon-test-ssh-7111/inventory.yml
[2024-05-08 16:33:07] [INFO] ALLOCATOR: SSH connection successful.
[2024-05-08 16:33:16] [INFO] ALLOCATOR: Track file generated at /tmp/wazuh-qa/cbordon-test-ssh-7111/track.yml