wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
61 stars 30 forks source link

Performance tests: cluster of N nodes scenario #1139

Closed BraulioV closed 3 years ago

BraulioV commented 3 years ago

This scenario will test how the cluster performs when there're a big number of agents connected to them. The objective is to get synchronization times, discover bottlenecks, and use this as a baseline to test the performance of the distributed API and its endpoints.

Objectives

jmv74211 commented 3 years ago

In this second iteration, we are going to improve the test cluster pipeline to allow deployment, configuration ... of a cluster of N nodes instead of a single manager scenario.

To do this, I estimate the following tasks:

jmv74211 commented 3 years ago

Task 1

I have added the following parameters to the Jenkins UI:

MANAGER_WORKERS

It is used to specify the number of worker nodes to be deployed.

image

LOCAL_INTERNAL_OPTIONS_CONFIG

Used to specify the custom settings to be applied in the local_internal_options.conf file. This will be applied to all cluster nodes.

image

Here you can see the changes involved in the pipeline for this task https://github.com/wazuh/wazuh-jenkins/commit/5a5233044728066f169a7297ab074462e0d20054

jmv74211 commented 3 years ago

Task 2

The total number of managers will be 1 + manager_workers_num.

Since I had already prepared the parallel deployment of N instances for the managers, the only change required for this task was as follows https://github.com/wazuh/wazuh-jenkins/commit/c68c1e4b1465bf0eb8e36974aefcdb487e8ab4b4

jmv74211 commented 3 years ago

Task 3

Provisioning is already done in parallel because all instances belong to the same host group in the Ansible inventory. By default the following is done for all nodes:

jmv74211 commented 3 years ago

Task 4

The cluster configuration has been done by adding a new <ossec_config> block at the end of the ossec.conf file with the specific configuration of each cluster node (https://github.com/wazuh/wazuh-jenkins/blob/87efef121fc9adc64ee4d3de7437081b6609e46a/jenkins-files%2Ftests%2Fperformance%2Ftest_cluster.groovy#L496-L510).

In this environment, there is a master node and the rest will be worker nodes.

The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/87efef121fc9adc64ee4d3de7437081b6609e46a

For example, the configuration of a master with two workers would be as follows:

NAME                                            TYPE    VERSION  ADDRESS        
master                                          master  4.2.0    172.31.14.13   
Test_cluster_performance_sprint2_B10_manager_1  worker  4.2.0    172.31.3.193   
Test_cluster_performance_sprint2_B10_manager_2  worker  4.2.0    172.31.15.234
jmv74211 commented 3 years ago

Task 5 and 10

Due to the dependence of doing task 10 to perform 5, the two have been done together.

The directory and file structure for each of the instances (master and worker nodes) has already been organized.

At the end you get a tar.gz file that contains a directory for each instance, and each one has stored data, logs and graphics.

The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/85c7ab88568b7d84ea1807df3cddcfa54cb25a65

jmv74211 commented 3 years ago

Task 6

The content of the custom local_internal_options file entered from the Jenkins UI has been added to the cluster provisioning, so each node will have the specified configuration.

The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/23cb4968744eb2478fa49f531812844126cbc905

jmv74211 commented 3 years ago

Task 7

It is now possible to select the type of protocol to be used for communication with the cluster. This configuration is performed during the provisioning and configuration of all nodes.

The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/45850570c1c14a3ff696d24bea44ff8e1fb66d4b

jmv74211 commented 3 years ago

Task 8

Testing the changes made in the agent simulator so that the agent can register on the master node and then connect to a worker, we have discovered that the parser in the cluster configuration does not work as expected. All the details are indicated in this issue https://github.com/wazuh/wazuh/issues/8229.

These agent simulator changes have been merged into the wazuh-qa repository with the following PR https://github.com/wazuh/wazuh-qa/pull/1233

jmv74211 commented 3 years ago

Task 9

The statistics and their graphs have been grouped by demons. This makes it much easier to find the information you want.

The structure is as follows:

├── data
│   ├── binaries
│   │   ├── ...
│   └── stats
│       ├── analysisd
│       │   └── wazuh-analysisd_stats.csv
│       ├── logcollectord
│       │   ├── CSV 1
│       │   ├── CSV 2
│       │   ├── ...
│       └── remoted
│           └── wazuh-remoted_stats.csv
├── logs
│   ├── ...
└── plots
    ├── binaries
    │   ├── ...
    └── stats
        ├── analysisd
        │   ├── SVG 1
        │   ├── SVG 2
        │   ├── ...
        ├── logcollectord
        │   ├── SVG 1
        │   ├── SVG 2
        │   ├── ...
        └── remoted
        │   ├── SVG 1
        │   ├── SVG 2
        │   ├── ...

The related changes are as follows https://github.com/wazuh/wazuh-jenkins/commit/b19228098ca069936cc0af8d6b09d2ad665a2846:

jmv74211 commented 3 years ago

Task 11

To solve the conflict error when parsing the cluster blocks in the ossec.conf (mentioned in the previous task), I have added a task before adding the cluster configuration to remove the block corresponding to the cluster from the ossec.conf.

See the changes here https://github.com/wazuh/wazuh-jenkins/commit/3d840a75f5bd2290ebc15d342fea4ac8de6018a5

jmv74211 commented 3 years ago

Task 12

I've been researching AWS load balancing service.

In order to use this service from our Jenkins pipeline, it will be necessary to create a series of additional modules to be able to apply the necessary calls using AWS cli, which allows us to:

I am going to start the development of this new module, as I progress, I will comment on it.

jmv74211 commented 3 years ago

I have already finished the development of the load balancer. I had to make some additional adjustments to the agent simulator and the script because I found some bugs during this development.

As a result, we have that an on demand load balancer is created in AWS as long as the number of worker nodes is greater than 0. In this load balancer all the worker nodes will be registered and all the agent connections will be distributed to these nodes, as we can see in the following example:

[root@ip-172-31-15-151 ec2-user]# /var/ossec/bin/cluster_control -a
ID   NAME                           IP         STATUS  VERSION       NODE NAME                                       
000  ip-172-31-15-151.ec2.internal  127.0.0.1  active  Wazuh v4.2.0  master                                          
001  1-db6873f1-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_1  
002  1-26cf3108-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_2  
003  1-ed742f0a-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_3  
004  1-0b8047a6-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_1  
005  1-42d49a73-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_1  
006  1-5ebd8747-debian10            10.0.2.15  active  Wazuh 4.2.0   Test_cluster_performance_sprint2_B89_manager_3 

At the end of the test, this load balancer is destroyed like the rest of the AWS instances.

The changes made are as follows https://github.com/wazuh/wazuh-jenkins/commit/d8be61449829ee7bc6b3431143fab7837ddb31de

jmv74211 commented 3 years ago

Task 14

I have added two new parameters to the pipeline to select the type of instances to deploy to agents and managers.

image

Also, I have updated the deployment logic to the following:

The changes made are as follows https://github.com/wazuh/wazuh-jenkins/commit/a5fac3783b5a2b91392bed6b26cc9eaa2d3ebeaa

BraulioV commented 3 years ago

Closed by https://github.com/wazuh/wazuh-jenkins/pull/2518