Rebits commented 2 weeks ago

Description

This issue is dedicated to conducting a thorough performance analysis of two proposed development approaches:

@wazuh/devel-framework: https://github.com/wazuh/wazuh/issues/23058
@wazuh/devel-core2 development: https://github.com/wazuh/wazuh/issues/22867

The objective is to perform performance tests and compare the results of both approaches. This comparative analysis will provide a comprehensive understanding of the potential impact on the product.

Test environment

Component	Quantity	Operating System	CPU (cores)	RAM (GB)	Disk (GB)
Master	1	Ubuntu 22	4	8	50
Workers	2	Ubuntu 22	4	8	50
Agent 1	1	Ubuntu 22	2	4	30
Agent 2	1	Windows 11	2	4	30
Load Balancer	1	Ubuntu 22	4	8	50
Indexers	2	Ubuntu 22	2	4	30

[!NOTE] The load balancer is located on the master node.

23058 Development Packages

Architecture	Framework development package URL URL
DEB	4.8.0-python.vd.spike.deb.1
RPM	4.8.0-python.vd.spike.rpm.1

22867 Development Packages

Architecture	Core development package URL
DEB	4.8.0-0.commitd31b277
RPM	4.8.0-0.commitd31b277

Test Cases

Testing

Automatic

Methodology

Utilizing the CLUSTER-Workload_benchmarks_metrics pipeline to execute specified test cases automatically. Results will be manually analyzed and shared with the development team for validation adjustments.

Test Cases

Case	Description	Number of Agents	EPS	Frequency	Number of Vulnerable Packages	Time
Minimum Activity	Simulate a small, stable environment with low activity	10	10	600	100	3h
Medium Activity	Simulate a medium-sized environment with moderate activity	50	10	300	100	3h
High Activity	Simulate a large-scale environment with significant activity	200	50	60	100	3h

Manual

Methodology

Customizing the set of vulnerable packages is not feasible in automatic testing. Therefore, manual testing will utilize a larger set of 10,000 vulnerabilities to identify any potential instability in environments with a high vulnerability count. The following Wazuh-QA tools will be employed for manual performance analysis:

Monitor class for resource measurement of Wazuh central components
Statistics class for Wazuh data analysis
Simulate agents script for Wazuh agent simulation

Test Cases

Case	Description	Number of Agents	EPS	Frequency	Number of Vulnerable Packages	Time
High Vulnerability Environment	Simulate an intermediate-sized environment with high vulnerability	10	10	60	10,000	3h

Conclusion :red_circle:

New Issues

Known issues

https://github.com/wazuh/wazuh-jenkins/issues/6203

[!NOTE] Manual performance testing, Minimum Activity and High Activity has not been performed. More information in https://github.com/wazuh/wazuh-qa/issues/5313#issuecomment-2100349272

Rebits commented 1 week ago

Automatic

Minimum Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/510/
Medium Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
High Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/512/

Rebits commented 1 week ago

Minimum Activity and High activity performance tests fail due to no space left error. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6475

22:03:52  
22:03:52  TASK [Copy ossec.log file to data files] ***************************************
22:03:52  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_2]: UNREACHABLE! => {
22:03:52      "changed": false,
22:03:52      "unreachable": true
22:03:52  }
22:03:52  
22:03:52  MSG:
22:03:52  
22:03:52  Warning: Permanently added '172.31.3.110' (ECDSA) to the list of known hosts.

22:03:52  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.7137516-30912-167679972105845’: No space left on device
22:03:52  
22:03:53  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_1]: UNREACHABLE! => {
22:03:53      "changed": false,
22:03:53      "unreachable": true
22:03:53  }
22:03:53  
22:03:53  MSG:
22:03:53  
22:03:53  Warning: Permanently added '172.31.4.31' (ECDSA) to the list of known hosts.

22:03:53  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.724964-30911-242038256013694’: No space left on device

Only Medium Activity performance tests finished successfully Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/

Rebits commented 1 week ago

Medium Activity :red_circle:

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/ Report: Artifact.zip

Logs :red_circle:

Summary

Worker logs indicate the same database error reported in https://github.com/wazuh/wazuh/issues/22847
No errors present in the master node
No errors present in the indexer nodes

Master :yellow_circle:

Master node is started before the correct indexer configuration is set. Expected:

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Worker 1 :red_circle:

Worker node is started before the correct indexer configuration is set. Expected

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Multiple database errors reported in https://github.com/wazuh/wazuh/issues/22847

2024/05/07 21:24:24 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2024/05/07 21:24:24 wazuh-remoted: INFO: (1410): Reading authentication keys file.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_osinfo
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.

Worker 2 :yellow_circle:

Worker node is started before the correct indexer configuration is set. Expected

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Indexer 1 :green_circle:

No warnings or errors

Indexer 2 :green_circle:

No warnings or errors

Metrics :red_circle:

Summary

Low resource usage in the master node
Possible file descriptor leaks. Reported in https://github.com/wazuh/wazuh/issues/23202
Worker nodes are experiencing high CPU and memory usage due to an unrealistic level of activity, with an expected influx of 500 syscollector messages per second in a two-node cluster environment. As a result, it's unsurprising to observe these elevated values

Master :green_circle:

Metrics

![CPU](https://github.com/wazuh/wazuh-qa/assets/11089305/4d248bc5-0742-41ac-8fc1-cd692b205138) ![Disk_Read](https://github.com/wazuh/wazuh-qa/assets/11089305/08b1383e-64bb-499e-beea-cfbccbb06b45) ![Disk_Read_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/3d95ad26-d424-4c6a-848d-4988c81f1f71) ![Disk_Write_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/95c4e10b-84c8-4052-b3c4-d51f0079a529) ![Disk_Written](https://github.com/wazuh/wazuh-qa/assets/11089305/f6148307-c6ee-418b-8510-26626c1020ff) ![FD](https://github.com/wazuh/wazuh-qa/assets/11089305/12f7e8ce-05fc-42e0-9357-41b51d5a24b8) ![PSS](https://github.com/wazuh/wazuh-qa/assets/11089305/cc62b19a-3b92-4f33-a46e-22fc2910b7a0) ![Read_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/ed7619ce-de58-4fc3-8587-1aab5f820a0d) ![RSS](https://github.com/wazuh/wazuh-qa/assets/11089305/48afbbde-8fd3-4d77-a8f2-94fd5349d95e) ![SWAP](https://github.com/wazuh/wazuh-qa/assets/11089305/ec161e62-5435-46d0-8bf0-68dfcd53aea0) ![USS](https://github.com/wazuh/wazuh-qa/assets/11089305/3b8b7f6e-e3f3-4808-87d0-22acb7699770) ![VMS](https://github.com/wazuh/wazuh-qa/assets/11089305/ad0ff505-fda3-40e6-a4e1-76bc81a6ac87) ![Write_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/dc2c2a17-1bd1-43e9-ac97-4a02e2ffea58)

Worker 1 :red_circle:

Metrics

![CPU](https://github.com/wazuh/wazuh-qa/assets/11089305/6a857f50-dc4f-418f-9758-8a912c70087e) ![Disk_Read](https://github.com/wazuh/wazuh-qa/assets/11089305/5e758eac-f295-4355-ad5e-a16695ff9dfa) ![Disk_Read_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/4d5221fd-5e2c-4334-8b88-a1d5afd69ef3) ![Disk_Write_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/983ba4bd-a20f-4688-b31e-3d8d0e9cdc37) ![Disk_Written](https://github.com/wazuh/wazuh-qa/assets/11089305/42d13d72-bd9a-4540-9930-ea485cacc0b1) ![FD](https://github.com/wazuh/wazuh-qa/assets/11089305/7ecf45fd-9eeb-44a3-9a25-6868475dee6f) ![PSS](https://github.com/wazuh/wazuh-qa/assets/11089305/38039e8a-047f-4622-9f96-9f2688433bec) ![Read_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/881f7bbd-2d25-4482-bf93-c264950a6db3) ![RSS](https://github.com/wazuh/wazuh-qa/assets/11089305/837bdce4-97a7-466f-bed0-b5f5170506c3) ![SWAP](https://github.com/wazuh/wazuh-qa/assets/11089305/7d2c8545-e518-40be-8326-e77c8869cccf) ![USS](https://github.com/wazuh/wazuh-qa/assets/11089305/78417a04-55d3-48d1-8cbd-e6030a72cd02) ![VMS](https://github.com/wazuh/wazuh-qa/assets/11089305/b687c128-86c8-4b04-99b2-ef047b9c9d23) ![Write_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/5a65edb6-98d6-4e3c-9604-1726d8113ebe)

Worker 2 :red_circle:

Metrics

![CPU](https://github.com/wazuh/wazuh-qa/assets/11089305/92b8da0c-28e7-41ad-985b-ebdaa3413f55) ![Read_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/4bf80760-05a2-4924-a33e-fcb024593aa1) ![RSS](https://github.com/wazuh/wazuh-qa/assets/11089305/420c9f5b-3c27-40a0-b28a-47dee92cfc6c) ![SWAP](https://github.com/wazuh/wazuh-qa/assets/11089305/e2fa2028-285a-47e4-bd87-6fd582da58ce) ![USS](https://github.com/wazuh/wazuh-qa/assets/11089305/1fcad186-9d0e-4aae-a52c-4bc9226c1d92) ![VMS](https://github.com/wazuh/wazuh-qa/assets/11089305/3601ce31-e8a1-41a2-a9c1-6baaed279042) ![Write_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/2a3275d3-1ee4-48af-be2c-36f2fc1a6cfe) ![Disk_Read](https://github.com/wazuh/wazuh-qa/assets/11089305/88328f02-0ae2-4dc5-bcaa-1d66858283b0) ![Disk_Read_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/9be3173c-fc7f-4ffb-a206-ed3d678d6507) ![Disk_Write_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/2fdbae2f-6358-40ea-9a97-83a865eaa594) ![Disk_Written](https://github.com/wazuh/wazuh-qa/assets/11089305/ff4bcb65-d810-4208-9ae2-4bdaafb452cb) ![FD](https://github.com/wazuh/wazuh-qa/assets/11089305/5d5fdedd-85e2-4284-b1e1-1e19afe4b16a) ![PSS](https://github.com/wazuh/wazuh-qa/assets/11089305/7dfaf325-d508-49b7-bd5f-de849b182d08)

Indexer 1 :green_circle:

No abnormal behavior detected

Metrics

![CPU](https://github.com/wazuh/wazuh-qa/assets/11089305/950421f2-6f77-49cb-b9fd-71916a4d8e1d) ![Disk_Read](https://github.com/wazuh/wazuh-qa/assets/11089305/da1c0e26-3997-4f35-bb95-c0b8afafe2e0) ![Disk_Read_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/41836d0f-2738-4d84-a86d-8549651d57b6) ![Disk_Write_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/136829e7-32ee-4478-aff5-7daa9235eeba) ![Disk_Written](https://github.com/wazuh/wazuh-qa/assets/11089305/4793e479-a09c-485d-84c7-e2b269cb243f) ![FD](https://github.com/wazuh/wazuh-qa/assets/11089305/4218b259-cd72-4d87-9738-670f034f45ce) ![PSS](https://github.com/wazuh/wazuh-qa/assets/11089305/851256d9-ea3d-416a-9b3d-565745b714e2) ![Read_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/cd77d750-e1b2-42d4-9023-bdd1b6c9e0e2) ![RSS](https://github.com/wazuh/wazuh-qa/assets/11089305/0cfe10bf-6755-4b29-8ca4-ee13d565f199) ![SWAP](https://github.com/wazuh/wazuh-qa/assets/11089305/f46712c5-34fa-4b23-a84d-1f1c6e3d2dd1) ![USS](https://github.com/wazuh/wazuh-qa/assets/11089305/20dee659-6699-4ca4-b566-36f8f1f82405) ![VMS](https://github.com/wazuh/wazuh-qa/assets/11089305/355268ac-f7f5-4ddb-82f2-c5645855acdb) ![Write_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/20748e73-0566-4c4f-9301-526f2f060f35)

Indexer 2 :green_circle:

No abnormal behavior detected

Metrics

![CPU](https://github.com/wazuh/wazuh-qa/assets/11089305/cd103c54-f6ea-4b42-9a9b-e656e0d74213) ![Disk_Read](https://github.com/wazuh/wazuh-qa/assets/11089305/c0700f96-2028-47f0-a0cb-8f5d65800e52) ![Disk_Read_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/5ab2ec6a-6cd8-45e8-9470-802ca875b545) ![Disk_Write_Speed](https://github.com/wazuh/wazuh-qa/assets/11089305/d3a0bd85-b4ae-495f-b7cf-b3b1ffbc68e7) ![Disk_Written](https://github.com/wazuh/wazuh-qa/assets/11089305/0f15752c-cfb6-4bfc-abb3-28451bfdade2) ![FD](https://github.com/wazuh/wazuh-qa/assets/11089305/8bb8018a-7502-4f1d-a4a6-d83dbda6b2fb) ![PSS](https://github.com/wazuh/wazuh-qa/assets/11089305/67aed832-c40d-4c6f-bd5c-e78a25242b98) ![Read_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/c2e80b80-c678-4fa2-b0be-3f9fd416cc75) ![RSS](https://github.com/wazuh/wazuh-qa/assets/11089305/088b12b9-38ec-4d5d-9619-25a3a4e04a76) ![SWAP](https://github.com/wazuh/wazuh-qa/assets/11089305/4a7d789d-8591-40d2-b831-9b61e5978a06) ![USS](https://github.com/wazuh/wazuh-qa/assets/11089305/7b8d74d6-de33-416b-b8ed-463e98f85090) ![VMS](https://github.com/wazuh/wazuh-qa/assets/11089305/950e4930-2a63-418a-bd23-bfe39a915d76) ![Write_Ops](https://github.com/wazuh/wazuh-qa/assets/11089305/9aa19181-53e0-47dd-babd-0c8e6764511a)

Statistics :green_circle:

Vulnerabilities State :green_circle:

The vulnerability generator module, utilized by the simulate agents script, is designed to transmit 100 vulnerable packages to the manager and subsequently confirm their removal. This behavior is visualized through sinuous graphics, reaching a peak with each repetition after processing all vulnerabilities.

In the plot, it's evident that the indexer connector fails to match the ideal expected graphics. However, it's apparent that the simulator is performing as intended.

total_vulnerabilities

Implementing various testing methods to determine if the final number of vulnerabilities aligns with expectations at specific points during the test could be highly beneficial.

Alerts :green_circle:

We anticipate that the alerts generated by both the workers and the manager should correspond with the indexed alert values. Nonetheless, there appears to be a discrepancy:

combined_and_new_total_alerts

Due to the high activity levels, some variance between the written alerts and indexed alerts is expected. However, it would be advantageous to incorporate testing methods to gradually mitigate this, thereby stabilizing the environment over time.

Evidence collection :red_circle:

It has been detected the following errors regarding the evidence-collection capabilities of the pipeline

Vulnerabilities and alerts indexed metrics do not contain timestamps. Including the timestamp will make it easy to compare these values with the rest of the graphics. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6474
Indexer statistics were present in the logcollector directory. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6473
Statistics values for analysis are not correctly plotted. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6203

Rebits commented 1 week ago

Following a discussion with @juliamagan, we've made the decision not to replicate the unsuccessful High Activity and Low Activity performance tests. Instead, these tests will be re-launched in RC2

MARCOSD4 commented 1 week ago

GJ, but the graphs of the indexer 1 metrics cannot be displayed, perhaps because of an error in writing the comment.

MARCOSD4 commented 1 week ago

LGTM

juliamagan commented 1 week ago

LGTM

wazuh / wazuh-qa

Performance for Vulnerability Detection module in clustered environments #5313

Description

Test environment

23058 Development Packages

22867 Development Packages

Test Cases

Testing

Automatic

Methodology

Test Cases

Manual

Methodology

Test Cases

Conclusion :red_circle:

New Issues

Known issues

Automatic

Medium Activity :red_circle:

Logs :red_circle:

Summary

Master :yellow_circle:

Worker 1 :red_circle:

Worker 2 :yellow_circle:

Indexer 1 :green_circle:

Indexer 2 :green_circle:

Metrics :red_circle:

Summary

Master :green_circle:

Worker 1 :red_circle:

Worker 2 :red_circle:

Indexer 1 :green_circle:

Indexer 2 :green_circle:

Statistics :green_circle:

Vulnerabilities State :green_circle:

Alerts :green_circle:

Evidence collection :red_circle: