wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
63 stars 30 forks source link

Unstable connection problems between GHA runners and macStadium #5346

Closed jotacarma90 closed 2 weeks ago

jotacarma90 commented 3 weeks ago

Description

Hello team, we are working on the package migration:

Specifically on the issue:

We are having some problems running macOS package generation workflows due to connection errors with allocator module, timeout through VPN,

[2024-05-06 13:16:18] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance. [2024-05-06 13:16:26] [INFO] ALLOCATOR: Instance gha_8969706488_build created. Error: 024-05-06 13:17:26] [ERROR] ALLOCATOR: Command failed: Vagrant cannot forward the specified ports on this VM, since they would collide with some other application that is already listening on these ports. The forwarded port to 43220 is already in use on the host machine.

To fix this, modify your current project's Vagrantfile to use another port. Example, where '1234' would be replaced by a unique host port:

config.vm.network :forwarded_port, guest: 22, host: 1234

Sometimes, Vagrant will attempt to auto-correct this for you. In this case, Vagrant was unable to. This is usually because the guest machine is in a state which doesn't allow modifying port forwarding. You could try 'vagrant reload' (equivalent of running a halt followed by an up) so vagrant can attempt to auto-correct this upon booting. Be warned that any unsaved work might be lost.

[2024-05-06 13:17:26] [INFO] ALLOCATOR: Instance gha_8969706488_build started. [2024-05-06 13:17:31] [DEBUG] ALLOCATOR: Instance gha_8969706488_build is not running. Starting it. Error: 024-05-06 13:17:36] [ERROR] ALLOCATOR: Command failed: Vagrant cannot forward the specified ports on this VM, since they would collide with some other application that is already listening on these ports. The forwarded port to 43220 is already in use on the host machine.

To fix this, modify your current project's Vagrantfile to use another port. Example, where '1234' would be replaced by a unique host port:

config.vm.network :forwarded_port, guest: 22, host: 1234

Sometimes, Vagrant will attempt to auto-correct this for you. In this case, Vagrant was unable to. This is usually because the guest machine is in a state which doesn't allow modifying port forwarding. You could try 'vagrant reload' (equivalent of running a halt followed by an up) so vagrant can attempt to auto-correct this upon booting. Be warned that any unsaved work might be lost.

Error: 024-05-06 13:17:40] [ERROR] ALLOCATOR: Command failed: The provider for this Vagrant-managed machine is reporting that it is not yet ready for SSH. Depending on your provider this can carry different meanings. Make sure your machine is created and running and try again. Additionally, check the output of vagrant status to verify that the machine is in the state that you expect. If you continue to get this error message, please view the documentation for the provider you're using.

Traceback (most recent call last): File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 39, in main() File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 35, in main Allocator.run(InputPayload(**vars(parse_arguments()))) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 37, in run return cls.create(payload) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 63, in create inventory = cls.generate_inventory(instance, payload.inventory_output) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 130, in generate_inventory ssh_config = instance.ssh_connection_info() File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 153, in ssh_connection_info match = re.search(pattern, output) File "/usr/lib/python3.10/re.py", line 200, in search return _compile(pattern, flags).search(string) TypeError: expected string or bytes-like object Error: ***[error]Process completed with exit code 1.

- Example 2:

python3 deployability/modules/allocation/main.py --action create --provider vagrant --size large --composite-name macos-ventura-sign-arm64 --working-dir /tmp/allocatorvm --track-output /tmp/allocatorvm/track.yml --inventory-output /tmp/allocatorvm/inventory.yml --instance-name gha_8967086155_build [2024-05-06 09:39:32] [DEBUG] SPNEGO._GSS: Python gssapi not available, cannot use any GSSAPIProxy protocols: No module named 'gssapi' [2024-05-06 09:39:32] [DEBUG] SPNEGO._GSS: Python gssapi IOV extension not available: No module named 'gssapi' [2024-05-06 09:39:32] [INFO] ALLOCATOR: Creating instance at /tmp/allocatorvm [2024-05-06 09:39:32] [DEBUG] ALLOCATOR: Creating instance directory on remote host [2024-05-06 09:39:35] [INFO] ALLOCATOR: macStadium ARM server has less than 2 VMs running, deploying in this host. [2024-05-06 09:39:37] [DEBUG] ALLOCATOR: No config provided. Generating from payload [2024-05-06 09:39:37] [DEBUG] ALLOCATOR: Generating new key pair [2024-05-06 09:39:41] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance. Traceback (most recent call last): File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 39, in main() File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 35, in main Allocator.run(InputPayload(**vars(parse_arguments()))) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 37, in run return cls.create(payload) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 56, in create instance = provider.create_instance( File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/generic/provider.py", line 70, in create_instance return cls._create_instance(base_dir, params, config, ssh_key) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/provider.py", line 95, in _create_instance VagrantUtils.remote_copy(Path(file).parent.parent / 'vagrant' / 'helpers' / 'vagrant_script.sh', host_instance_dir, remote_host_parameters) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/utils.py", line 73, in remote_copy raise ValueError(f"Command failed: {stderr.decode('utf-8')}") ValueError: Command failed: kex_exchange_identification: read: Connection reset by peer Connection reset by 10.10.0.250 port 22 lost connection

- Example 3:

python3 deployability/modules/allocation/main.py --action create --provider vagrant --size large --composite-name macos-highsierra-10.13.6-amd64 --working-dir /tmp/allocatorvm --track-output /tmp/allocatorvm/track.yml --inventory-output /tmp/allocatorvm/inventory.yml --instance-name gha_8941573697_build [2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi not available, cannot use any GSSAPIProxy protocols: No module named 'gssapi' [2024-05-03 15:52:35] [DEBUG] SPNEGO._GSS: Python gssapi IOV extension not available: No module named 'gssapi' [2024-05-03 15:52:35] [INFO] ALLOCATOR: Creating instance at /tmp/allocatorvm [2024-05-03 15:52:35] [DEBUG] ALLOCATOR: Creating instance directory on remote host [2024-05-03 15:52:41] [INFO] ALLOCATOR: Using the macStadium Intel server to deploy. [2024-05-03 15:52:43] [DEBUG] ALLOCATOR: No config provided. Generating from payload [2024-05-03 15:52:43] [DEBUG] ALLOCATOR: Generating new key pair [2024-05-03 15:52:53] [DEBUG] ALLOCATOR: Vagrantfile created. Creating instance. Error: 024-05-03 15:53:23] [ERROR] ALLOCATOR: Command failed: Connection reset by 10.10.0.249 port 22

[2024-05-03 15:53:23] [INFO] ALLOCATOR: Instance gha_8941573697_build created. Error: 024-05-03 15:55:37] [ERROR] ALLOCATOR: Command failed: ssh: connect to host 10.10.0.249 port 22: Connection timed out

[2024-05-03 15:55:37] [INFO] ALLOCATOR: Instance gha_8941573697_build started. Error: 024-05-03 15:55:39] [ERROR] ALLOCATOR: Command failed: sudo: /Users/jenkins/testing/gha_8941573697_build/vagrant_script.sh: command not found

Traceback (most recent call last): File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 39, in main() File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/main.py", line 35, in main Allocator.run(InputPayload(**vars(parse_arguments()))) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 37, in run return cls.create(payload) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 63, in create inventory = cls.generate_inventory(instance, payload.inventory_output) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/allocation.py", line 130, in generate_inventory ssh_config = instance.ssh_connection_info() File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 142, in ssh_connection_info if not 'running' in self.status(): File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 121, in status return self.parse_vagrant_status(output) File "/home/runner/work/wazuh-agent-packages/wazuh-agent-packages/deployability/modules/allocation/vagrant/instance.py", line 238, in parse_vagrant_status lines = message.split('\n') AttributeError: 'NoneType' object has no attribute 'split'



- Example 4, manually canceled due to long time blocked:
![image](https://github.com/wazuh/internal-devel-requests/assets/60003131/ac2852df-0211-4b1a-8586-09ff79f2843d)
![image](https://github.com/wazuh/internal-devel-requests/assets/60003131/0e3022ee-90b0-448f-8ef5-1393f4dcbf91)

DRI name: @MarcelKemp 
teddytpc1 commented 2 weeks ago

This issue was resolved by: https://github.com/wazuh/wazuh-agent-packages/issues/15.