Closed BraulioV closed 3 years ago
In this second iteration, we are going to improve the test cluster pipeline to allow deployment, configuration ... of a cluster of N nodes instead of a single manager scenario.
To do this, I estimate the following tasks:
ossec.conf
./var/ossec/bin/cluster_control -i more
??I have added the following parameters to the Jenkins UI:
MANAGER_WORKERS
It is used to specify the number of worker nodes to be deployed.
LOCAL_INTERNAL_OPTIONS_CONFIG
Used to specify the custom settings to be applied in the local_internal_options.conf
file. This will be applied to all cluster nodes.
Here you can see the changes involved in the pipeline for this task https://github.com/wazuh/wazuh-jenkins/commit/5a5233044728066f169a7297ab074462e0d20054
The total number of managers will be 1 + manager_workers_num
.
Since I had already prepared the parallel deployment of N instances for the managers, the only change required for this task was as follows https://github.com/wazuh/wazuh-jenkins/commit/c68c1e4b1465bf0eb8e36974aefcdb487e8ab4b4
Provisioning is already done in parallel because all instances belong to the same host group in the Ansible inventory. By default the following is done for all nodes:
wazuh-manager
service.The cluster configuration has been done by adding a new <ossec_config>
block at the end of the ossec.conf
file with the specific configuration of each cluster node (https://github.com/wazuh/wazuh-jenkins/blob/87efef121fc9adc64ee4d3de7437081b6609e46a/jenkins-files%2Ftests%2Fperformance%2Ftest_cluster.groovy#L496-L510).
In this environment, there is a master node and the rest will be worker nodes.
The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/87efef121fc9adc64ee4d3de7437081b6609e46a
For example, the configuration of a master with two workers would be as follows:
NAME TYPE VERSION ADDRESS
master master 4.2.0 172.31.14.13
Test_cluster_performance_sprint2_B10_manager_1 worker 4.2.0 172.31.3.193
Test_cluster_performance_sprint2_B10_manager_2 worker 4.2.0 172.31.15.234
Due to the dependence of doing task 10 to perform 5, the two have been done together.
The directory and file structure for each of the instances (master and worker nodes) has already been organized.
At the end you get a tar.gz
file that contains a directory for each instance, and each one has stored data, logs and graphics.
The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/85c7ab88568b7d84ea1807df3cddcfa54cb25a65
The content of the custom local_internal_options
file entered from the Jenkins UI has been added to the cluster provisioning, so each node will have the specified configuration.
The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/23cb4968744eb2478fa49f531812844126cbc905
It is now possible to select the type of protocol to be used for communication with the cluster. This configuration is performed during the provisioning and configuration of all nodes.
The changes made are as follows: https://github.com/wazuh/wazuh-jenkins/commit/45850570c1c14a3ff696d24bea44ff8e1fb66d4b
Testing the changes made in the agent simulator so that the agent can register on the master node and then connect to a worker, we have discovered that the parser in the cluster configuration does not work as expected. All the details are indicated in this issue https://github.com/wazuh/wazuh/issues/8229.
These agent simulator changes have been merged into the wazuh-qa
repository with the following PR https://github.com/wazuh/wazuh-qa/pull/1233
The statistics and their graphs have been grouped by demons. This makes it much easier to find the information you want.
The structure is as follows:
├── data
│ ├── binaries
│ │ ├── ...
│ └── stats
│ ├── analysisd
│ │ └── wazuh-analysisd_stats.csv
│ ├── logcollectord
│ │ ├── CSV 1
│ │ ├── CSV 2
│ │ ├── ...
│ └── remoted
│ └── wazuh-remoted_stats.csv
├── logs
│ ├── ...
└── plots
├── binaries
│ ├── ...
└── stats
├── analysisd
│ ├── SVG 1
│ ├── SVG 2
│ ├── ...
├── logcollectord
│ ├── SVG 1
│ ├── SVG 2
│ ├── ...
└── remoted
│ ├── SVG 1
│ ├── SVG 2
│ ├── ...
The related changes are as follows https://github.com/wazuh/wazuh-jenkins/commit/b19228098ca069936cc0af8d6b09d2ad665a2846:
To solve the conflict error when parsing the cluster blocks in the ossec.conf
(mentioned in the previous task), I have added a task before adding the cluster configuration to remove the block corresponding to the cluster from the ossec.conf
.
See the changes here https://github.com/wazuh/wazuh-jenkins/commit/3d840a75f5bd2290ebc15d342fea4ac8de6018a5
I've been researching AWS load balancing service.
In order to use this service from our Jenkins pipeline, it will be necessary to create a series of additional modules to be able to apply the necessary calls using AWS cli
, which allows us to:
I am going to start the development of this new module, as I progress, I will comment on it.
I have already finished the development of the load balancer. I had to make some additional adjustments to the agent simulator and the script because I found some bugs during this development.
As a result, we have that an on demand load balancer is created in AWS as long as the number of worker nodes is greater than 0. In this load balancer all the worker nodes will be registered and all the agent connections will be distributed to these nodes, as we can see in the following example:
[root@ip-172-31-15-151 ec2-user]# /var/ossec/bin/cluster_control -a
ID NAME IP STATUS VERSION NODE NAME
000 ip-172-31-15-151.ec2.internal 127.0.0.1 active Wazuh v4.2.0 master
001 1-db6873f1-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_1
002 1-26cf3108-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_2
003 1-ed742f0a-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_3
004 1-0b8047a6-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_1
005 1-42d49a73-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_1
006 1-5ebd8747-debian10 10.0.2.15 active Wazuh 4.2.0 Test_cluster_performance_sprint2_B89_manager_3
At the end of the test, this load balancer is destroyed like the rest of the AWS instances.
The changes made are as follows https://github.com/wazuh/wazuh-jenkins/commit/d8be61449829ee7bc6b3431143fab7837ddb31de
I have added two new parameters to the pipeline to select the type of instances to deploy to agents and managers.
Also, I have updated the deployment logic to the following:
If the testing time exceeds the threshold (45 min), c5xlarge
is selected for both agent and manager.
auto
mode will always choose t2.medium
unless the test time does not exceed the threshold, or the user has not explicitly selected c5xlarge
.
If the user chooses c5xlarge
, it will be selected
If the test mode is AGENTS
, then the type of instances used to deploy agents will be c5xlarge
.
The changes made are as follows https://github.com/wazuh/wazuh-jenkins/commit/a5fac3783b5a2b91392bed6b26cc9eaa2d3ebeaa
This scenario will test how the cluster performs when there're a big number of agents connected to them. The objective is to get synchronization times, discover bottlenecks, and use this as a baseline to test the performance of the distributed API and its endpoints.
Objectives