Closed Rebits closed 5 months ago
Development Branch |
---|
qa/tmp-4.8.0-22847-fix |
In order to determine the root of the failures in E2E tests, we have created the temporal branch qa/tmp-4.8.0-22847-fix
. This branch comes from 4.8.0
(https://github.com/wazuh/wazuh/commit/678bca8ff3e86158ccdeadc24cb0131c129c28c4) and it includes all the fixes present in 4.7.5 (https://github.com/wazuh/wazuh/commit/2f97b3386e04fdcc1395118159056e447bd6effd)
For this testing, we have created new packages only for Deb manager and Indexer. No changes were performed in the agents, so no new packages were needed.
[!NOTE] The creation of Indexer packages was necessitated by constraints within the deployment tool utilized for our End-to-End (E2E) tests (https://ci.wazuh.info/job/Wazuh_QA_environment/1166/)
The testing environment was deployed using the Wazuh_QA_Environment pipeline.
[!IMPORTANT] These tests were conducted with the outdated vulnerability index name (#5401). For results after resolving this issue, please check this comment: https://github.com/wazuh/wazuh-qa/issues/5397#issuecomment-2119877559
Report: R1.zip
Reported issues:
Initial scan tests fail due to the presence of errors in the managers and the absence of any vulnerabilities in the index:
> assert test_result.get_test_result(), test_result.report()
E AssertionError:
E Test test_first_syscollector_scan[vd_disabled_when_agents_registration] failed
E
E Check all_agents_scanned_syscollector_first_scan succeeded
E Check all_agents_scanned_vulnerability_first_scan failed. Evidences (['agents_not_scanned_vulnerability_first_scan']) can be found in the report.
E Check no_errors failed. Evidences (['error_level_messages']) can be found in the report.
E -----
E
E assert False
E + where False = <bound method TestResult.get_test_result of <wazuh_testing.end_to_end.TestResult object at 0x7f1d1ea83160>>()
E + where <bound method TestResult.get_test_result of <wazuh_testing.end_to_end.TestResult object at 0x7f1d1ea83160>> = <wazuh_testing.end_to_end.TestResult object at 0x7f1d1ea83160>.get_test_result
Reviewing the evidence collected we can see the following errors in the report:
{
"manager1": {
"ERROR": [
"2024/05/17 12:11:11 wazuh-modulesd:vulnerability-scanner[83133] scanOrchestrator.hpp:143 at operator()(): ERROR: Error processing delayed event: Error executing rescan for multiple agents."
],
"CRITICAL": [],
"WARNING": []
},
"manager2": {
"ERROR": [
"2024/05/17 12:11:14 wazuh-modulesd:vulnerability-scanner[72528] scanOrchestrator.hpp:143 at operator()(): ERROR: Error processing delayed event: Error executing rescan for multiple agents."
],
"CRITICAL": [],
"WARNING": []
},
The Error processing delayed event: Error executing rescan for multiple agents
seems to have occurred in both managers.
Vulnerability Index is empty
{
"agent1": [],
"agent3": [],
"agent4": [],
"agent5": [],
"agent2": []
}
In addition, we can see that the index isn't even created:
root@ip-172-31-9-51:/home/qa# curl -k -u USER:PASSWORD https://172.31.9.51:9200/wazuh-states-vulnerabilities/_search -H 'Content-Type: application/json' -d '{
"size": 1000,
"query": {
"bool": {
"must": [
{
"match": {
"agent.id": "001"
}
}
]
}
}
}'
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [wazuh-states-vulnerabilities]","index":"wazuh-states-vulnerabilities","resource.id":"wazuh-states-vulnerabilities","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [wazuh-states-vulnerabilities]","index":"wazuh-states-vulnerabilities","resource.id":"wazuh-states-vulnerabilities","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}root@ip-172-31-9-51:/home/qa#
This was not a configuration error. We can see that initially, indexer-connected was correctly initialized
2024/05/17 12:07:09 indexer-connector[80410] indexerConnector.cpp:319 at initialize(): INFO: IndexerConnector initialized successfully for index: wazuh-states-vulnerabilities-wazuh.
However, we can see that vulnerabilities were processed:
...
2024/05/17 12:11:10 wazuh-modulesd:vulnerability-scanner[83133] packageScanner.hpp:270 at platformVerify(): DEBUG: The platform is in the list based on CPE comparison for Package: kernel, Version: 4.18.0-348.20.1.el7, CVE: CVE-2023-0160, Content platform CPE: cpe:/o:redhat:enterprise_linux:7 OS CPE: cpe:/o:redhat:enterprise_linux:7
2024/05/17 12:11:10 wazuh-modulesd:vulnerability-scanner[83133] packageScanner.hpp:536 at versionMatch(): DEBUG: No match due to default
...
After this failure, the rest of the tests were aborted to troubleshoot the environment. The vulnerability state was never recovered even after restarting both managers and waiting more than half an hour.
[!IMPORTANT] It seems that tests are failing due to the index has been renamed https://github.com/wazuh/wazuh/pull/23274 Currentl researching impact of detected error in E2E tests
After changing index name initial scans seems to detect the vulnerabilities. However, previously mentioned error was present. Reported in https://github.com/wazuh/wazuh/issues/23512
Due to the launch tests running in debug mode, obtaining full results takes a significant amount of time. These tests have been automatically triggered using custom packages through the Test_e2e_system pipeline.
We are using a custom branch, tmp-testing-vd-rc2
, which incorporates fixes from the following issues: #5401 and #5368. These fixes aim to accurately assess the current status of the VD module.
This test iteration includes the fix for renaming the vulnerability tests. For more details, see: https://github.com/wazuh/wazuh-qa/issues/5401.
Vulnerability Detection Module
E2E Tests
test_first_syscollector_scan[vd_disabled_when_agents_registration]
:red_circle: test_first_syscollector_scan[vd_enabled_when_agents_registration]
:red_circle: test_consistency_initial_scans
:red_circle: test_syscollector_second_scan
:red_circle: Vulnerability Detection Module
E2E Tests
CVE-2023-3128
is not included in vulnerabilities lists for Grafana packages. Fixed in https://github.com/wazuh/wazuh-qa/issues/5368 (b9d6f2b891c67ac89a89bfc07561d7ba70513edc)test_install_vulnerable_package_when_agent_down[install_package]
:red_circle: test_change_agent_manager[install_package]
:red_circle: test_vulnerability_detector_scans_cases[remove_package]
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_maintain_vulnerability
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_add_vulnerability]
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_maintain_add_vulnerability]
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_remove_vulnerability]
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_nonvulnerable_to_vulnerable]
:red_circle: test_vulnerability_detector_scans_cases[upgrade_package_nonvulnerable_to_nonvulnerable]
:red_circle: test_vulnerability_detector_scans_cases[install_package_non_vulnerable]
:red_circle: test_vulnerability_detector_scans_cases[remove_non_vulnerable_packge]
:red_circle: Currently trying to replicate Duplicated vulnerabilities found in index
issue
Research https://github.com/wazuh/wazuh/issues/23530. This seems to be produced by a test bug. Reported in https://github.com/wazuh/wazuh-qa/issues/5410
Initial scan discrepancies seem to be related to changes in VD content. We should consider avoiding increasing the timeout in https://github.com/wazuh/wazuh-qa/issues/5404. Currently researching the issue
Some final unexpected failures detected in the last iteration of the tests:
Build: https://ci.wazuh.info/job/Test_e2e_system/289/
Regarding the analysis performed during https://github.com/wazuh/wazuh/issues/23523, alerts triggered during initial scans were expected due to content updates. This behavior should be taken into account by E2E test (created an issue to perform these changes https://github.com/wazuh/wazuh-qa/issues/5412) However, in the last iteration of the E2E test, it seems that the consistency test between initial scans has failed again without multiple content feed updates:
vd_disabled_when_agents_registration
Manager1
2024/05/21 18:07:27 wazuh-modulesd:vulnerability-scanner: INFO: Initiating update feed process
2024/05/21 18:16:17 wazuh-modulesd:vulnerability-scanner: INFO: Feed update process completed
Manager2
2024/05/21 18:08:18 wazuh-modulesd:vulnerability-scanner: INFO: Initiating update feed process
2024/05/21 18:20:19 wazuh-modulesd:vulnerability-scanner: INFO: Feed update process completed
However, the test checks for vulnerabilities much later:
2024-05-21 18:44:35 Checking vulnerabilities in the index (test_vulnerability_detector.py:323)
The agents completed multiple syscollector scans since the feed update finished, so the final vulnerabilities should match the latest feed changes:
2024/05/21 18:21:18 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/05/21 18:21:20 wazuh-modulesd:syscollector: INFO: Evaluation finished.
2024/05/21 18:22:21 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/05/21 18:22:24 wazuh-modulesd:syscollector: INFO: Evaluation finished.
2024/05/21 18:23:24 wazuh-modulesd:syscollector: INFO: Starting evaluation.
vd_enabled_when_agents_registration No new feed update events occurred, and the agents also performed multiple syscollector scans:
2024/05/21 18:56:29 wazuh-modulesd:syscollector: INFO: Evaluation finished.
2024/05/21 18:57:30 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/05/21 18:57:32 wazuh-modulesd:syscollector: INFO: Evaluation finished.
2024/05/21 18:58:32 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/05/21 18:58:35 wazuh-modulesd:syscollector: INFO: Evaluation finished.
2024/05/21 18:59:35 wazuh-modulesd:syscollector: INFO: Starting evaluation.
2024/05/21 18:59:37 wazuh-modulesd:syscollector: INFO: Evaluation finished.
However, the vulnerabilities have changed between scans. For example, the following vulnerability appeared in the second scan:
[ "CVE-2022-48658", "kernel", "4.18.0-348.20.1.el7", "arm64"]
After discussing with @Dwordcito, it appears this is related to https://github.com/wazuh/wazuh/issues/23482. I'll include this issue in the conclusion and ensure the report is added to the issue thread.
After some research, it seems that vulnerabilities are correctly generated for this test, although the specified time is not enough. Reported in https://github.com/wazuh/wazuh-qa/issues/5413
Teardown package removal was not correctly produced due to an unattended upgrade on agent5:
ERROR root:remote_operations_handler.py:378 Error removing package on agent5: Failed to remove package in agent5: {'changed': False, 'msg': "'apt-get remove 'grafana'' failed: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 17560 (unattended-upgr)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", 'rc': 100, 'stderr': 'E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 17560 (unattended-upgr)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n', 'stderr_lines': ['E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 17560 (unattended-upgr)', 'E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?'], 'stdout': '', 'stdout_lines': []}
Reported in https://github.com/wazuh/wazuh-qa/issues/5415
Some manual testing led to the detection of some issues in the case of cluster rename. Reported in https://github.com/wazuh/wazuh/issues/23540
LGTM
Description
The end-to-end (E2E) VD tests are failing. The tests were executed to validate the issue described in wazuh-qa issue #5368, but it seems the tests are not passing.
Initial thoughts attributed the failures to a database error. However, similar issues appeared in wazuh-qa issue #5319, where vulnerabilities were detected. Therefore, there might be an additional underlying problem.
Report details
Observations
Vulnerability Index: The index appears empty in all tests, including the initial scans, which is unusual.
Database Errors: There are database-related errors in the logs that should not be appearing. Example logs:
Alerts: Unexpected alert behavior, including:
macOS agent triggers an OS Vulnerability alert:
This alert suggests OS updates, which should not occur during the tests.
Windows alerts detected vulnerability mitigations before package installation. This might indicate an unexpected mid-test upgrade or a bad handling of the environment inventory/vulnerabilities.
Analysis
Potential causes for these errors:
Test Errors
Known Product Errors
Unknown Production Errors
Action Plan
Since it is unclear if the database error is solely responsible, and there might be other underlying issues, we propose the following:
Validated by
Conclusion :red_circle:
E2E Tests
Vulnerability Detection