Create WazuhDB endpoint to recalculate agent group hashes

TomasTurina commented 2 weeks ago

Description

A condition was discovered that could lead the cluster to enter an infinite loop to synchronize agent groups information.

When the master node fails to set the group_hash column for any agent in the global.db, this column remains empty until a new group modification occurs, which is something that doesn't happen very often. By synchronizing these columns in the cluster, the worker nodes receive the information and recalculate the hashes, producing a different result than the master which has some empty hashes. The master does not synchronize the hashes of empty groups, the worker nodes calculate them and that is why this happens.

To fix this bug, it is proposed to implement a new endpoint in WazuhDB that the framework can use to make the manager recalculate all agent group hashes. This way the master will be able to recalculate bad group hashes without having to wait for the agent to change groups.

This new endpoint will be called recalculate-agent-group-hashes. It will not receive any parameters, it will simply iterate the list of all agents and recalculate the group_hash for all of them. About the group_sync_status column, it will be set to synced if group_hash doesn't change and to syncreq if it does (depending on if it's a worker or master node).

Nicogp commented 2 weeks ago

I started working on the issue

Added the new endpoint
Changed the wdb_global_recalculate_agent_groups_hash function to receive the old hash as a parameter and update the db in case it changes.

Branch: https://github.com/wazuh/wazuh/tree/fix/23422-groups-hash

First tests (single node):

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  'global sql update agent set group_sync_status="syncreq" w
here id=2'
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_sync_status": "syncreq"
    }
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "synced"
    }
]
root@nico-VirtualBox:/home/nico#

Nicogp commented 2 weeks ago

Test cluster:

Set the group_hash of agent 2 to NULL to force continuous synchronization between master and worker:

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "synced"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_sync_status": "synced"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]

cluster.log (worker):

cluster.log

``` root@vagrant:/home/vagrant# cat /var/ossec/logs/cluster.log | grep "The checksum of master" 2024/05/14 23:02:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:16 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:26 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:36 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:46 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:03:56 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. ```

Recalculation of hashes in the master node:

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "syncreq"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]

cluster.log (worker) after recalculation of hashes:

cluster.log

``` 2024/05/14 23:03:57 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.019s. Updated 0 chunks. 2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:03:58 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files. 2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file. 2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent. 2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok'' 2024/05/14 23:03:58 INFO: [Worker worker01-node] [Integrity check] Finished in 0.035s. Sync not required. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str'' 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd'' 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w'' 2024/05/14 23:04:06 INFO: [Worker worker01-node] [Agent-groups recv] Starting. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different. 2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] Checksum comparison failed (2/5). 2024/05/14 23:04:06 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.011s. Updated 1 chunks. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted. 2024/05/14 23:04:07 INFO: [Worker worker01-node] [Agent-info sync] Starting. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.001s. 2024/05/14 23:04:07 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.005s. Updated 0 chunks. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:04:07 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent. 2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok'' 2024/05/14 23:04:07 INFO: [Worker worker01-node] [Integrity check] Finished in 0.022s. Sync not required. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str'' 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd'' 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w'' 2024/05/14 23:04:16 INFO: [Worker worker01-node] [Agent-groups recv] Starting. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match. Counter reset. 2024/05/14 23:04:16 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.006s. Updated 1 chunks. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:04:16 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent. 2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok'' 2024/05/14 23:04:16 INFO: [Worker worker01-node] [Integrity check] Finished in 0.060s. Sync not required. 2024/05/14 23:04:17 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted. 2024/05/14 23:04:17 INFO: [Worker worker01-node] [Agent-info sync] Starting. 2024/05/14 23:04:17 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.001s. 2024/05/14 23:04:17 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.005s. Updated 0 chunks. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c'' 2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity check] Finished in 0.053s. Sync required. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_file'' 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'file_upd'' 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'file_end'' 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_e'' 2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Starting. 2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Files to create: 1 | Files to update: 0 | Files to delete: 0 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Worker does not meet integrity checks. Actions required. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Updating local files: Start. 2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Updating local files: End. 2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Finished in 0.010s. 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str'' 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd'' 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w'' 2024/05/14 23:04:26 INFO: [Worker worker01-node] [Agent-groups recv] Starting. 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s. 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s. 2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match. 2024/05/14 23:04:26 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks. 2024/05/14 23:04:26 DEBUG: [Local Server] [Keep alive] Calculating. 2024/05/14 23:04:26 DEBUG: [Local Server] [Keep alive] Calculated. 2024/05/14 23:04:27 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted. 2024/05/14 23:04:27 INFO: [Worker worker01-node] [Agent-info sync] Starting. 2024/05/14 23:04:27 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.000s. 2024/05/14 23:04:27 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.002s. Updated 0 chunks. 2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:04:34 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 38 files. 2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file. 2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent. 2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok'' 2024/05/14 23:04:34 INFO: [Worker worker01-node] [Integrity check] Finished in 0.033s. Sync not required. 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str'' 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd'' 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w'' 2024/05/14 23:04:36 INFO: [Worker worker01-node] [Agent-groups recv] Starting. 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s. 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s. 2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match. 2024/05/14 23:04:36 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.008s. Updated 1 chunks. 2024/05/14 23:04:36 INFO: [Worker worker01-node] [Keep Alive] Successful response from master: keepalive 2024/05/14 23:04:37 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted. 2024/05/14 23:04:37 INFO: [Worker worker01-node] [Agent-info sync] Starting. 2024/05/14 23:04:37 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.000s. 2024/05/14 23:04:37 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.002s. Updated 0 chunks. 2024/05/14 23:04:43 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted. 2024/05/14 23:04:43 INFO: [Worker worker01-node] [Integrity check] Starting. 2024/05/14 23:04:43 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 38 files. ```

Nicogp commented 2 weeks ago

Added UTs for:

wdb_global_parser() function endpoint recalculate-agent-group-hashes
wdb_global_recalculate_all_agent_groups_hash() function

Nicogp commented 2 weeks ago

Update 15/05/2024

Changes were applied to avoid modifying the group_sync_status column.
Hash recalculation is performed for all agents (except id=0), even though the column group==NULL
PR created and under revision

wazuh / wazuh

Create WazuhDB endpoint to recalculate agent group hashes #23422

Description

Update 15/05/2024