Closed nico-stefani closed 5 days ago
The error was reproduced in a v4.8.0-rc1
environment with 2 Wazuh manager nodes and an agent when the latter is deleted via the DELETE /agents
request:
$ curl -k -X DELETE "https://localhost:55555/agents?agents_list=001&status=all&older_than=0s" -H "Authorization: Bearer $TOKEN" | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 168 100 168 0 0 5090 0 --:--:-- --:--:-- --:--:-- 5090
{
"data": {
"affected_items": [
"001"
],
"total_affected_items": 1,
"total_failed_items": 0,
"failed_items": []
},
"message": "All selected agents were deleted",
"error": 0
}
api.log
2024/05/08 16:09:22 INFO: wazuh-wui 172.26.0.1 "DELETE /agents" with parameters {"agents_list": "002", "status": "all", "older_than": "0s"} and body {} done in 0.010s: 200
When the cluster is in the process of completing the Agent-info sync
task, the error arises:
Master cluster.log
2024/05/08 16:09:28 DEBUG: [Worker f8a7c0e28eb8] [Main] Command received: b'syn_a_w_m_p'
2024/05/08 16:09:28 DEBUG: [Worker f8a7c0e28eb8] [Main] Command received: b'new_str'
2024/05/08 16:09:28 DEBUG: [Worker f8a7c0e28eb8] [Main] Command received: b'str_upd'
2024/05/08 16:09:28 DEBUG: [Worker f8a7c0e28eb8] [Main] Command received: b'syn_a_w_m'
2024/05/08 16:09:28 INFO: [Worker f8a7c0e28eb8] [Agent-info sync] Starting.
2024/05/08 16:09:28 DEBUG2: [Worker f8a7c0e28eb8] [Agent-info sync] Chunk 1/1: [{"id":2,"name":"f03f6fab78d9","ip":"172.26.0.5","os_name":"Ubuntu","os_version":"20.04.6 LTS","os_major":"20","os_minor":"04","os_codename":"Focal Fossa","os_platform":"ubuntu","os_uname":"Linux |f03f6fab78d9 |5.15.0-105-generic |#115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024 |x86_64","os_arch":"x86_64","version":"Wazuh v4.8.0","config_sum":"ab73af41699f13fdd81903b5f23d8d00","merged_sum":"4a8724b20dee0124ff9656783c490c4e","manager_host":"f8a7c0e28eb8","node_name":"f8a7c0e28eb8","last_keepalive":1715177363,"connection_status":"active","disconnection_time":0,"group_config_status":"synced","status_code":0,"labels":[{"id":2,"key":"#\"_agent_ip\"","value":"172.26.0.5"},{"id":2,"key":"#\"_manager_hostname\"","value":"f8a7c0e28eb8"},{"id":2,"key":"#\"_node_name\"","value":"f8a7c0e28eb8"},{"id":2,"key":"#\"_wazuh_version\"","value":"Wazuh v4.8.0"}]}]
2024/05/08 16:09:28 ERROR: [Worker f8a7c0e28eb8] [Agent-info sync] Wazuh-db response for chunk 1/1 was not "ok": Error 2003 - Error in wazuhdb request: Cannot execute Global database query; FOREIGN KEY constraint failed
2024/05/08 16:09:28 DEBUG: [Worker f8a7c0e28eb8] [Agent-info sync] 0/1 chunks updated in wazuh-db in 0.001s.
2024/05/08 16:09:28 INFO: [Worker f8a7c0e28eb8] [Agent-info sync] Finished in 0.006s. Updated 0 chunks.
Given that the test environment has the debug mode enabled, the ossec.log
file provides further information regarding what failed:
2024/05/08 16:09:28 wazuh-db[142] wdb_parser.c:859 at wdb_parse(): DEBUG: Global query: sync-agent-groups-get {"condition": "sync_status", "set_synced": true, "get_global_hash": true}
2024/05/08 16:09:28 wazuh-db[142] wdb_integrity.c:759 at wdb_get_global_group_hash(): DEBUG: No group hash was found to calculate the global group hash.
2024/05/08 16:09:28 wazuh-db[142] main.c:335 at run_dealer(): DEBUG: New client connected (19).
2024/05/08 16:09:28 wazuh-db[142] wdb_parser.c:859 at wdb_parse(): DEBUG: Global query: sync-agent-info-set [{"id":2,"name":"f03f6fab78d9","ip":"172.26.0.5","os_name":"Ubuntu","os_version":"20.04.6 LTS","os_major":"20","os_minor":"04","os_codename":"Focal Fossa","os_platform":"ubuntu","os_uname":"Linux |f03f6fab78d9 |5.15.0-105-generic |#115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024 |x86_64","os_arch":"x86_64","version":"Wazuh v4.8.0","config_sum":"ab73af41699f13fdd81903b5f23d8d00","merged_sum":"4a8724b20dee0124ff9656783c490c4e","manager_host":"f8a7c0e28eb8","node_name":"f8a7c0e28eb8","last_keepalive":1715177363,"connection_status":"active","disconnection_time":0,"group_config_status":"synced","status_code":0,"labels":[{"id":2,"key":"#\"_agent_ip\"","value":"172.26.0.5"},{"id":2,"key":"#\"_manager_hostname\"","value":"f8a7c0e28eb8"},{"id":2,"key":"#\"_node_name\"","value":"f8a7c0e28eb8"},{"id":2,"key":"#\"_wazuh_version\"","value":"Wazuh v4.8.0"}]}]
2024/05/08 16:09:28 wazuh-db[142] wdb.c:1112 at wdb_exec_stmt_silent(): DEBUG: SQL statement execution failed
2024/05/08 16:09:28 wazuh-db[142] wdb_parser.c:5997 at wdb_parse_global_sync_agent_info_set(): DEBUG: Global DB Cannot execute SQL query; err database queue/db/global.db: FOREIGN KEY constraint failed
Comparing the differences between the previous stage (v4.8.0-beta6
) and the one having the error (v4.8.0-rc1
), the following was found:
diff --git a/src/wazuh_db/schema_global.sql b/src/wazuh_db/schema_global.sql
index 75fcfb7bb8..4fc5331eca 100644
--- a/src/wazuh_db/schema_global.sql
+++ b/src/wazuh_db/schema_global.sql
@@ -49,7 +49,7 @@ CREATE INDEX IF NOT EXISTS agent_group_hash ON agent (group_hash);
INSERT INTO agent (id, ip, register_ip, name, date_add, last_keepalive, `group`, connection_status) VALUES (0, '127.0.0.1', '127.0.0.1', 'localhost', strftime('%s','now'), 253402300799, NULL, 'active');
CREATE TABLE IF NOT EXISTS labels (
- id INTEGER,
+ id INTEGER REFERENCES agent (id) ON DELETE CASCADE,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (id,key)
@@ -84,4 +84,4 @@ CREATE TABLE IF NOT EXISTS metadata (
value TEXT
);
-INSERT INTO metadata (key, value) VALUES ('db_version', '5');
+INSERT INTO metadata (key, value) VALUES ('db_version', '6');
This change might generate the error when the agent is deleted and consequently, the associated labels from the table.
The DELETE /agents
request does not directly delete the agents from the database, instead, it sends a command to the authd
process to make the deletion.
The error does not originate in the cluster itself, it might be related to the new constraint added to the labels
table and the id
field. The corresponding issue #23349 was opened for the team in charge to analyze what could be the problem.
The error was reproduced and the root of the error identified. The failure was caused by a change in the database schema owned by the Core team and whose access from the framework is through wazuh-authd.
Opening an issue so that the corresponding team can solve it is the correct thing to do. LGTM!
Good catch @fdalmaup. This change was included in v4.7.4 and then merged to 4.8.0.
Based on the conclusions from https://github.com/wazuh/wazuh/issues/23349#top, we should add a mechanism to ignore this specific exception from the db.
The mechanism to ignore the exception in the send_data_to_wdb
method was applied. When forcing the error, the ossec.log
file reports it:
2024/05/10 12:39:18 wazuh-db[147] wdb_parser.c:859 at wdb_parse(): DEBUG: Global query: sync-agent-info-set [{"id":1,"name":"f77e92dad2f2","ip":"172.18.0.5","os_name":"Ubuntu","os_version":"20.04.6 LTS","os_major":"20","os_minor":"04","os_codename":"Focal Fossa","os_platform":"ubuntu","os_uname":"Linux |f77e92dad2f2 |5.15.0-105-generic |#115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024 |x86_64","os_arch":"x86_64","version":"Wazuh v4.8.0","config_sum":"ab73af41699f13fdd81903b5f23d8d00","merged_sum":"4a8724b20dee0124ff9656783c490c4e","manager_host":"9a9bd3dcf169","node_name":"9a9bd3dcf169","last_keepalive":1715337557,"connection_status":"active","disconnection_time":0,"group_config_status":"synced","status_code":0,"labels":[{"id":1,"key":"#\"_agent_ip\"","value":"172.18.0.5"},{"id":1,"key":"#\"_manager_hostname\"","value":"9a9bd3dcf169"},{"id":1,"key":"#\"_node_name\"","value":"9a9bd3dcf169"},{"id":1,"key":"#\"_wazuh_version\"","value":"Wazuh v4.8.0"}]}]
2024/05/10 12:39:18 wazuh-db[147] wdb.c:1112 at wdb_exec_stmt_silent(): DEBUG: SQL statement execution failed
2024/05/10 12:39:18 wazuh-db[147] wdb_parser.c:5997 at wdb_parse_global_sync_agent_info_set(): DEBUG: Global DB Cannot execute SQL query; err database queue/db/global.db: FOREIGN KEY constraint failed
2024/05/10 12:39:18 wazuh-db[147] main.c:397 at run_worker(): DEBUG: Client 33 disconnected.
2024/05/10 12:39:18 wazuh-db[147] wdb_parser.c:859 at wdb_parse(): DEBUG: Global query: sync-agent-groups-get {"condition": "sync_status", "set_synced": true, "get_global_hash": true}
2024/05/10 12:39:18 wazuh-db[147] wdb_integrity.c:759 at wdb_get_global_group_hash(): DEBUG: No group hash was found to calculate the global group hash.
Nevertheless, it is no longer raised in the cluster.log
file:
2024/05/10 12:39:07 DEBUG: [Master] [SendSync] Receiving SendSync request (remoted) from 9a9bd3dcf169 (535832)
2024/05/10 12:39:08 DEBUG: [Worker 9a9bd3dcf169] [Main] Command received: b'syn_a_w_m_p'
2024/05/10 12:39:08 DEBUG: [Worker 9a9bd3dcf169] [Main] Command received: b'new_str'
2024/05/10 12:39:08 DEBUG: [Worker 9a9bd3dcf169] [Main] Command received: b'str_upd'
2024/05/10 12:39:08 DEBUG: [Worker 9a9bd3dcf169] [Main] Command received: b'syn_a_w_m'
2024/05/10 12:39:08 INFO: [Worker 9a9bd3dcf169] [Agent-info sync] Starting.
2024/05/10 12:39:08 DEBUG: [Worker 9a9bd3dcf169] [Agent-info sync] 0/1 chunks updated in wazuh-db in 0.000s.
2024/05/10 12:39:08 INFO: [Worker 9a9bd3dcf169] [Agent-info sync] Finished in 0.003s. Updated 0 chunks.
Description
During #23268 we detect a recurrent error in the worker node logs:
It is needed to investigate the root cause of the problem before continuing to the next RC.
Checks
The following elements have been updated or reviewed (should also be checked if no modification is required):
api/test/integration/mapping/_test_mapping.py
).