wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
61 stars 30 forks source link

DTT1 - Design and develop PoC #4524

Closed rauldpm closed 4 months ago

rauldpm commented 8 months ago

EPIC: https://github.com/wazuh/wazuh-qa/issues/4495

Description

This issue aims to design and create an initial Proof of Concept based on the analysis carried out in the issue https://github.com/wazuh/wazuh-qa/issues/4519

In this way, the PoC will show the following functionalities on a single system:

This PoC will have the following bases:

The composition of the tests will be as follows:

rauldpm commented 8 months ago

Update report - Test


all:
  hosts:
    Agent:
      ansible_host: 192.168.56.34
      ansible_port: 22
    Manager:
      ansible_host: 192.168.56.35
      ansible_port: 22
  vars:
    ansible_user: 'vagrant'
    ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
    ansible_ssh_private_key_file: './utils/key'
ansible-playbook playbooks/provision_test.yml -i ./inventory.yaml --limit Agent
ansible-playbook playbooks/test.yml -i ./inventory.yaml --limit Agent
rauldpm commented 8 months ago

Update report - Test

Playbook execution ``` ╰─➤ python3 test.py PLAY [all] ********************************************************************* TASK [Gathering Facts] ********************************************************* ok: [Agent1] ok: [Manager] TASK [Install GPG key] ********************************************************* changed: [Manager] changed: [Agent1] TASK [Add Wazuh repository] **************************************************** changed: [Agent1] changed: [Manager] TASK [Update package information] ********************************************** changed: [Manager] changed: [Agent1] PLAY [Manager*] **************************************************************** TASK [Gathering Facts] ********************************************************* ok: [Manager] TASK [Install the Wazuh manager] *********************************************** changed: [Manager] TASK [Enable and start Wazuh service] ****************************************** changed: [Manager] => (item=systemctl daemon-reload) changed: [Manager] => (item=systemctl enable wazuh-manager) changed: [Manager] => (item=systemctl start wazuh-manager) PLAY [Agent*] ****************************************************************** TASK [Gathering Facts] ********************************************************* ok: [Agent1] TASK [Install the Wazuh agent with environment variables] ********************** changed: [Agent1] TASK [Enable and start Wazuh service] ****************************************** changed: [Agent1] => (item=systemctl daemon-reload) changed: [Agent1] => (item=systemctl enable wazuh-agent) changed: [Agent1] => (item=systemctl start wazuh-agent) PLAY RECAP ********************************************************************* Agent1 : ok=7 changed=5 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Manager : ok=7 changed=5 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ``` ![image](https://github.com/wazuh/wazuh-qa/assets/14913942/e3fe5b45-8412-49b0-aad8-df8438cc5099)
rauldpm commented 8 months ago

Update report

jnasselle commented 8 months ago

Update report

rauldpm commented 7 months ago

Update report

rauldpm commented 7 months ago

Still on hold due https://github.com/wazuh/wazuh/issues/19166

rauldpm commented 7 months ago

Still on hold due https://github.com/wazuh/wazuh/issues/19300 and https://github.com/wazuh/wazuh/issues/19166

rauldpm commented 7 months ago

Update report

rauldpm commented 7 months ago
rauldpm commented 7 months ago
fcaffieri commented 7 months ago

Observability module

Proposed architecture

image

Working on the proposed architecture, only the part that will enter DTT1

fcaffieri commented 6 months ago

Update Observability

Jenkins: image

Loki metrics: image

Grafana dashboard:

image image image

Jenkins log: image

Grafana Loki datasource image

fcaffieri commented 6 months ago

Update

I am investigating Jenkins, Prometheus and Grafana integration. The objective is to obtain a graph like the following:

image

This could give us a lot of real-time information that we do not have today.

fcaffieri commented 6 months ago

Update

Configured Prometheus with Jenkins and Grafana:

image

This can give us a lot of information about metrics of Jenkins, such us:

image

fcaffieri commented 6 months ago

Update

After analyzing prometheus and grafana, I was able to configure the following dashboards which provide information about:

image image image image

QU3B1M commented 5 months ago

Update report

jnasselle commented 5 months ago

Update

Based on PoC feedback (functional and non-functional) and weekly design meetings, the next tasks should be addressed

jnasselle commented 5 months ago

Update

@fcaffieri take a look at https://github.com/PrefectHQ/prefect and https://github.com/dagster-io/dagster

rauldpm commented 5 months ago

Review notes


Proposed plan

We must ensure the functionality of the main modules (Allocator/Provision/Test) before readapting the Observability module. Before working on each module, the previous one must be completed and validated. Each module must take into account the previous one

Modular

  1. Ensure the functionality of the Allocator module
    • Deployments are correct
    • Input/Output artifacts are correct
    • Parallelized deployment
    • Local/cloud deployment
  2. Ensure the functionality of the Provision module
    • Input artifacts are correct
    • Parallelized/sequential provisioning to each instance according to the provisioning case
  3. Ensure the functionality of the Testing module
    • Input/Output artifacts are correct
    • Tests are executed on each instance

Modular integration

  1. Ensure the operation of the Allocator/Provision/Testing modules in tandem
    • Jenkinsfile for cloud -> How is this executed locally? -> PoC environment deployment documentation
    • Modules are managed sequentially/parallel correctly
    • Correct inventories/artifacts (input/outputs) management
    • Will the Jenkinsfile use multithreading? How are we going to support the provision of an instance after his allocation if other instances are still being allocated?
      • Instead of calling the Python script one time with a composed inventory, the script must be called once for each inventory target and will only wait if the provision has dependencies from other instances not yet provisioned, for example, an instance that is going to be provisioned with nano, will not wait to other allocations, but an instance that will be provisioned with an agent, will wait for the manager instance to be allocated

Observability module

  1. Ensure the operation of the observability module
    • What dependencies currently exist between the observability module and the rest?
    • With the current PoC, can the modules be executed independently or together, both locally and in the cloud?
    • Should the modules know the tools necessary for the operation of the observability module or should this module be the one that adapts to the rest? Example: The testing module uses a DDBB (influxDB) to be used by Grafana

Before continuing with the final development of the modules, we should consider the changes and proposals discussed in the last meeting to evaluate the impact and carry out a second iteration of the PoC so that we validate these annotations.

fcaffieri commented 4 months ago

Review Notes

Referring to the comments made

The PoC is not clear, we must establish a series of actions to validate it before continuing with the next iteration

The documentation will be generated in the last iteration of the development, because as progress is made, problems are found that are being resolved, for more details on this see issue: https://github.com/wazuh/wazuh-qa/issues/ 4495 Referring to the modules and their easy execution, it is one of the problems found after the PoC and we are working on iteration 2. Regarding the observability module, as it was built it is not invasive or intrusive in the modules, it only collects information from the nodes where said modules run and stores it in Loki. No configuration or development is required within each module for its operation.

We must ensure the functionality of the main modules (Allocator/Provision/Test) before readapting the Observability module. Before working on each module, the previous one must be completed and validated. Each module must take into account the previous one

Exactly, this is what was proposed in iteration 2 to be developed, in conjunction with the analysis and implementation of the orchestrator of said modules.


Referring to the aforementioned modular approach

Indeed, iteration 2 is where everything mentioned in @raul's comment will be taken into account, in summary:


Referring to the observability module:

Conclusions

The PoC is considered finalized, according to the fact that the proposed objective for it was completed in the branch https://github.com/wazuh/wazuh-qa/tree/enhancement/4495-deployability-tier-1 and presented with acceptance of those interested. Another PoC will be scheduled at the end of iteration 2 of the DTT to address the points raised and problems found in iteration 1.