wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
61 stars 30 forks source link

Deployability testing tier 1 #4495

Open davidjiglesias opened 8 months ago

davidjiglesias commented 8 months ago

Description

The objective of this issue is to thoroughly test Wazuh packages' deployment on tier 1 operating systems and architectures. This includes fully automated tests engrained in Wazuh's CI processes.

This testing should focus on reliability, lightweightness, and speed. We will be referring to Deployability testing tier1 as DTT1 from now on.

Functional requirements

DTT1 includes the following combination of operating systems, versions, and architectures: | Operating System | Version | Component | Architectures | |----------------------------|------------------------|------------------------------------|-------------------------| | RedHat | 7 | agents, central components | x86_64, aarch64 | | RedHat | 8 | agents, central components | x86_64, aarch64 | | RedHat | 9 | agents, central components | x86_64, aarch64 | | CentOS | 7 | agents, central components | x86_64, aarch64 | | CentOS | 8 | agents, central components | x86_64, aarch64 | | Debian | 10 | agents, central components | x86_64, aarch64 | | Debian | 11 | agents, central components | x86_64, aarch64 | | Debian | 12 | agents, central components | x86_64, aarch64 | | Ubuntu | 18 | agents | x86_64, aarch64 | | Ubuntu | 20 | agents, central components | x86_64, aarch64 | | Ubuntu | 22 | agents, central components | x86_64, aarch64 | | Oracle Linux | 9 | agents, central components | x86_64, aarch64 | | Amazon Linux | 2 | agents, central components | x86_64, aarch64 | | Amazon Linux | 2023 | agents, central components | x86_64, aarch64 | | openSUSE | 15 | agents, ~central components~ | x86_64, aarch64 | | ~SUSE~ | ~15~ | ~agents, central components~ | ~x86_64, aarch64~ | | ~Fedora~ | ~38~ | ~agents~ | ~x86_64, aarch64~ | | Windows | 10 | agents | x86_64 ~, aarch64~ | | Windows | ~11~ | ~agents~ | ~x86_64, aarch64~ | | Windows | Server 2012 | agents | x86_64 ~, aarch64~ | | Windows | Server 2012 R2 | agents | x86_64 ~, aarch64~ | | Windows | Server 2016 | agents | x86_64 ~, aarch64~ | | Windows | Server 2019 | agents | x86_64 ~, aarch64~ | | Windows | Server 2022 | agents | x86_64 ~, aarch64~ | | macOS | Ventura | agents | x86_64, aarch64 | | macOS | Sonoma | agents | x86_64, aarch64 |

The OS from Fedora onwards are included in tier 2, because the development has not been completed from the allocation

Agents

High-level phases Agents - DTT1 includes the following high-level phases: - Install - Registration - Connection - Basic info (OS, arch, version) - Uninstall - Restart | Phase | Requirement | |----------------------------|-------------------------------------------------------------------------------| | Install | Install using [Wazuh dashboard's `Deploy new agent` wizard section](https://documentation.wazuh.com/current/installation-guide/wazuh-agent/index.html) | | Install | Ensure files have appropriate permissions (Checkfiles close-world) | | Install | Start using `wazuh-control` binary | | Registration | Enroll using `ossec.conf` targeting a specific manager | | Connection | Establish a connection with a single manager via TCP | | Basic info | Ensure the OS is accurately reported | | Basic info | Ensure the architecture (arch) is accurately reported | | Basic info | Ensure the version is accurately reported | | ~Upgrade~ | ~Ensure file permissions are maintained post-upgrade (Checkfiles close-world)~ | | ~Upgrade~ | ~Ensure configuration is maintained post-upgrade (ossec.conf, agent.conf, local_internal_options.conf)~ | | Restart | Restart using `wazuh-control` binary | | Restart | Ensure successful reconnection post-restart | | Stop | Confirm no remnants post-stop (e.g., processes, services, ports) | | Stop | Ensure agent properly disconnects | | Uninstall | Confirm no remnants post-uninstallation (e.g., processes, services, ports) | | Uninstall | Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf) |

Central components

High-level phases Central components - DTT1 includes the following high-level phases: - Install - Connection - Uninstall - Restart | Phase | Requirement | |----------------------------|-------------------------------------------------------------------------------| | Install | Install via [Quickstart](https://documentation.wazuh.com/current/quickstart.html) | | Install | Ensure files have appropriate permissions (Checkfiles close-world) | | Install | Start using service | | Connection | Ensure the component under test successfully connects with the other central components | | ~Upgrade~ | ~Confirm the new version is accurately reported~ | | Restart | Restart using service | | Restart | Ensure successful reconnection post-restart with the other central components | | Stop | Confirm no remnants post-stop (e.g., processes, services, ports) | | Stop | Ensure agent properly disconnects | | Uninstall | Confirm no remnants post-uninstallation (e.g., processes, services, ports, files) | | Uninstall | Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf) |

Non-functional requirements

Hardware

Agent - Hardware: - CPU: 1 - RAM: 500 Mb - Upgrade: - From the previous patch - From the previous minor
Central components - Hardware: - CPU: 4 - RAM: 8 Gb - Upgrade: - From the previous patch - From the previous minor

Implementation restrictions

Plan


First iteration

Objetive:

The objective of this iteration is to generate the skeleton of the modules and begin to detect problems that may arise from the new architecture. For this, a PoC described in the issues will be carried out.

Results:

The PoC was carried out. The modules were generated. During the development the following problems were encountered:


Second iteration:

Objetive:

For this iteration, it is necessary to resolve the problems found in the previous one. After the weekly https://github.com/wazuh/wazuh-qa/issues/4495#issuecomment-1853040846, it was decided to investigate tools that use the DAG methodology, to use it as an orchestrator. Refine the modules, according to what was proposed.

Results:

All the problems or topics found in iteration 1 were completed. On the other hand, some points of improvement were found as the new functionalities were developed and implemented:

General

  1. Document the usage of each module (TaskFlow, Allocation, Provision, Test and Observability)
  2. Generate class or flow diagrams for each module
  3. Improve validations and error handling, since it is not clear when a module fails, the reason for the failure. 3.1 TaskFlow 3.2 Allocation 3.3 Provision 3.3 Test
  4. Define and implement a Logger 4.1. Define centralized log 4.2. Format 4.3. Levels 4.4. Output file for module (level debug) + Jenkins log (level info)

TaskFlow

  1. Delete the schema validator parameter and use it internally

Allocation

  1. Move the Inventory model to module generics so every module uses the same Inventory model
  2. Add more sizes and OS for Vagrant providers
  3. Validate the working OS in Vagrant
  4. Add more sizes and OS for AWS
  5. Validate the working OS in AWS
  6. Special VMS
  7. Enable custom VM config for providers for both vagrant and aws
  8. Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant
  9. AWS instances add name and type labels to perform cost calculations and have them controlled
  10. Unify size types for Vagrant and AWS

Provision

  1. Add the uninstaller action by parameter to uninstall the desired component
  2. Allow installing any version of wazuh with Package (Currently only allowed with AIO)
  3. Get ansible_os_family to render templates with jinja2. This makes it easier to reuse templates
  4. Validate dependency tree 4.1. Validate the working OS in Vagrant 4.2. Validate the working OS in AWS 4.3. Adapt the dependencies installed for the tests so that they work on other systems such as CentOS 8
  5. Special VMS
  6. Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant

Testing

  1. Add Utils to test using the Wazuh API
  2. Add Utils to check all file permissions and ownership
  3. Add test for manager
  4. Test uninstall
  5. Remove the usage of the Playbook class to use just Ansible

Observability

  1. Define the usage of pytest-influxdb plugin for the test 1.1 If we decide to use it, carry out the implementation
  2. Define the new dashboards to be implemented according to the new definitions of the modules. Requires analysis and definition of the dashboards
  3. Obtain new logs from the modules to view them on a dashboard. Depends on General 4
  4. Investigate to generate a dashboard that shows the DAG generated by Taskkflow

Jenkins

  1. Adapt the Jenkins pipeline to execute the Taskflow with dry-run to generate the DAG
  2. Adapt the Jenkins pipeline to execute the Taskflow to stop the process running

Iteration 3:

Objective:

After iteration 2, the following points emerged that will be the goal of the last iteration of the project.

Tasks:

General

Workflow engine

Provision

Allocation

Tests

Add Copyright

Release

Results:


Issue to include in DTT Tier 2

Devepot automated unit test

Objective:

The objective of this stage is to generate automated unit tests for each module. It is expected to continue in DTT2. It is incorporated into DDT1 with the objective of beginning to define the test cases and the way they are implemented automatically.

Implement best practices

Jenkins implementation

Observability


Post-development:


Branch

Approved by

DRI name: @davidjiglesias CTO: @havidarou Objective: Bulletproof deployability tier1

rauldpm commented 8 months ago

The order of execution of the tests must be modified, since an upgrade implies the installation of the previous version, with the current proposal, this would not be possible since first the installation of the version to test the proposal is done:

  1. Install
  2. Registration
  3. Connection
  4. Basic info
  5. Restart
  6. Stop
  7. Uninstall
  8. Upgrade
jnasselle commented 7 months ago

Requirements review

OS and architecture unavailability

Agent's hardware requirements do not meet OS minimum requirements

Next OSes have higher hardware requirements

Known problems

Mangers on those OSes that only support Agents

Test order

Wazuh Manager and Wazuh Agent test interleaving

Wazuh Agent tests need some validation from the manager side(registration, connection) , but at the same time, the Wazuh Manager has their own testing. The idea is to determine/define the optimal and decoupled test flow that meets the requirements in the less available time

davidjiglesias commented 5 months ago

Requirements review

QU3B1M commented 5 months ago

Draw a high-level diagram of the modules workflow test

fcaffieri commented 5 months ago

Weekly Minutes DTT1

Participants: Kevin, Victor, Raul, Nico, Fede and David.

Conclusions: After the weekly on DTT, the need to incorporate DAG methodology was defined, in order to have an execution orchestrator which defines in a simple way and is user-friendly, the test cases that will be carried out. It must allow the flexibility to execute any use case in parallel and its output must be the yaml that will be used by the already defined modules (Allocation, Provision and Test). An analysis of the proposed tools, advantages and disadvantages of each is required, to choose, together with the team, the tool that fits natively to our needs. Its use must be simple, intuitive and scalable. To process this, the following issue was created https://github.com/wazuh/wazuh-qa/issues/4766

rauldpm commented 1 month ago

Moved ETA from 2024/03/29 to 2024/04/05 due to:

rauldpm commented 1 month ago

Moved ETA to 2024/04/9 due to:

rauldpm commented 1 month ago
rauldpm commented 1 month ago

This comment will report all bug issues opened from DTT1 and that should be worked as bug issues, not DTT-related issues, that is to say, they will be worked on after this issue is closed

rauldpm commented 1 month ago

We need the changes of https://github.com/wazuh/wazuh-qa/issues/5198 issue to merge this development

rauldpm commented 1 month ago

Moved ETA to 16/04/2024 due to https://github.com/wazuh/wazuh-qa/issues/5198 (based on issue ETA)

A new issue has been opened as we need to adapt the test module to use a single manager: #5202 (Same ETA) Desirable, but not stopper: https://github.com/wazuh/wazuh-qa/issues/5203

fcaffieri commented 1 month ago

The automation section is removed because it will be worked on DTT2

Automation

rauldpm commented 1 month ago

Moved ETA to 29/04/2024 as we have to work on the following issues

We need the following issue from the DevOps team

As 4.9.0 is targeted to 2/05/2024, we plan to use the 30/04 and 2/05 to test and retrieve metrics

rauldpm commented 3 weeks ago

Moved the ETA to 3/5/2024 as 1/5/2024 is a holiday and we need some time to test the changes in the main branch (https://github.com/wazuh/wazuh-qa/issues/5191). This has been discussed and approved with @davidjiglesias

rauldpm commented 2 weeks ago

Based on all DTT1 pending issues by each team and ETAs:

Team Issue Actual ETA
@wazuh/devel-devops https://github.com/wazuh/wazuh-qa/issues/5295 7/5/2024
@wazuh/devel-devops https://github.com/wazuh/wazuh-qa/issues/5311 10/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5240 2/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5230 3/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5218 3/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5219 6/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5191 15/5/2024
@wazuh/devel-qa-div1 https://github.com/wazuh/wazuh-qa/issues/5323 3/5/2024

This issue will change the ETA to Monday 15/5/2024 so we can test all changes (issue #5191)

rauldpm commented 1 week ago

Removed Windows ARM from OS list as there is no Windows ARM available yet