Closed TomasTurina closed 2 weeks ago
I started working on the issue
Branch: https://github.com/wazuh/wazuh/tree/fix/23422-groups-hash
First tests (single node):
root@nico-VirtualBox:/home/nico# python3 wdb-query.py 'global sql update agent set group_sync_status="syncreq" w
here id=2'
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
},
{
"id": 2,
"group_sync_status": "syncreq"
}
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
},
{
"id": 2,
"group_hash": "0da05cf3",
"group_sync_status": "synced"
}
]
root@nico-VirtualBox:/home/nico#
Test cluster:
Set the group_hash of agent 2 to NULL to force continuous synchronization between master and worker:
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
},
{
"id": 2,
"group_hash": "0da05cf3",
"group_sync_status": "synced"
},
{
"id": 4,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
}
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
},
{
"id": 2,
"group_sync_status": "synced"
},
{
"id": 4,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
}
]
cluster.log (worker):
Recalculation of hashes in the master node:
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
},
{
"id": 2,
"group_hash": "0da05cf3",
"group_sync_status": "syncreq"
},
{
"id": 4,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
}
]
cluster.log (worker) after recalculation of hashes:
Added UTs for:
recalculate-agent-group-hashes
Description
A condition was discovered that could lead the cluster to enter an infinite loop to synchronize agent groups information.
When the master node fails to set the
group_hash
column for any agent in theglobal.db
, this column remains empty until a new group modification occurs, which is something that doesn't happen very often. By synchronizing these columns in the cluster, the worker nodes receive the information and recalculate the hashes, producing a different result than the master which has some empty hashes. The master does not synchronize the hashes of empty groups, the worker nodes calculate them and that is why this happens.To fix this bug, it is proposed to implement a new endpoint in WazuhDB that the framework can use to make the manager recalculate all agent group hashes. This way the master will be able to recalculate bad group hashes without having to wait for the agent to change groups.
This new endpoint will be called
recalculate-agent-group-hashes
. It will not receive any parameters, it will simply iterate the list of all agents and recalculate thegroup_hash
for all of them. About thegroup_sync_status
column, it will be set tosynced
ifgroup_hash
doesn't change and tosyncreq
if it does (depending on if it's a worker or master node).