Parent issue:

https://github.com/wazuh/wazuh/issues/22887

Description

As part of the new data persistence model being implemented across Wazuh, we need a new way to manage commands sent from the Wazuh servers to the agents. In this spike, we will identify all the commands the agent must support, including the data required by the agent to execute them. We will also design a command manager which will be in charge of executing these commands.

The following diagram outlines a simplified design:

flowchart TD

subgraph Server
    API["Agent comms API server"]
end

subgraph Agent
    subgraph CommandManager["Command Manager"]
        CommandReceiver["Command Receiver (CR)"] -- "Persist commands" --> CommandStore["Command Store"]
        CommandReceiver -- "Execute" --> Executor["Executor"]
        Executor -- "Feedback" --> CommandReceiver
    end

    subgraph AgentCommsAPIClient["Agent Comms API Client"]
        Client["Client"]
        Client -- "Commands request" --> API
        API -- "Command response" --> Client
        Client -- "Commands" --> CommandReceiver
        CommandReceiver -- "Feedback" --> Client
    end
end

Functional requirements

Define a command type for each action the agent must execute.
Each command generates a result indicating completion status and outcome (success or error) with enough clarity for the user to take appropriate action.
Commands persist on the agent: upon execution, they are marked as completed and reported to the manager.
Completed commands are deleted from the agent once the manager confirms receipt of the result.
All commands must be executed in the order of creation.

Implementation restrictions

Language: We want to use C++ to leverage its performance and system-level capabilities.
Modular integration: This module should be implemented as a library to ensure modularity and reusability, allowing it to be integrated seamlessly with other modules.
Process tied: This module will be tied to the main process of the agent, ensuring it operates as a core component of the agent's functionality.

Plan

Identify all current commands: Review the existing agent to compile a comprehensive list of all commands that agents must support.
Define format and attributes of commands using JSON documents: Determine the structure and required attributes of the commands. Each command will be represented as a JSON document to standardize its format.
Design the command manager API to receive events and send feedback: Create a robust API for the command manager that allows it to receive command events from the server and provide clear feedback on the execution status or errors encountered.
Implement a restart command (nice-to-have): As a proof of concept, implement a basic restart command to ensure the command manager can execute commands correctly. This will involve coding the logic to handle the restart operation, persist the command, mark it as completed, and generate appropriate feedback.

Update

(11/06/2024) Understanding the issue requeriments. Reviewing related issues, discussing with team issue impact and doubts to deliver to management. (17/06/2024) Understanding new protocol POC to start design. A list of questions has been created to clarify tasks. (18/06/2024) Researching implementation language capabilities, still understanding the data storage and the functionalities and limitations it has to handle, researching about nlohmann json and its availability to cover our case to manage the messages straightforwardly. (19/06/2024) We have been discussing the list of messages this new module should be able to manage and as far as we have researched found the minimum commands are as follows:

Main commands:

Restart/Start/Stop: We understand the manager must be capable to start, stop and restart the agent, so this kind of command must be supported.
Agent Upgrade: As the manager can upgrade the agent so far, it must be capable of upgrading the agent by sending a command.
Central configuration: Shared configuration from the manager, the agent must set to work with.
Get configuration: Manager request to get the agent configuration.
Active response: Active response to be triggered based on a statefull event dispatched on the manager.

Optional/not definitive commands

Enrollment: the agent should be capable to enroll himself with the manager or with a new one if it required by the manager.

A primitive approach for the JSON structure representing the commands could be as follows:

command = {
    "origin": {                           // Data regarding manager requester.
        "name": "node01",                 // Name or manager ID.
        "module": "upgrade_module"        // Manager operation that triggers the command request
    },
    "command": "upgrade_update_status",   // ID for the operation to be performed in the agent side
    "parameters": {                       // Parameters regarding the operation
          ...
          ...
    },
    "status": "pending"                   // Command dispatch status
}

Currently designing sequence diagram diagram to illustrate it.

(20/06/2024) The attached diagram wants to replicate the following behavior:

When a command is received, a timestamp is generated, which can be used as a primary key or a part of it, and will be stored in a sqlite3 table. This table has 3 columns, timestamp representing the time at which the command is received, command, a JSON object that stores the information received and a status column, which will allow to track the status of the command.
This part of receiving the command has to be done by a method of the class, preferably running in a thread so that it constantly be listening for command arrivals and inserting them into the table. Each time an insertion is done, 1 will be added to a variable called pending_commands initialized to 0.
Another method running in another thread, has to be checking the pending commands variable, as long as it is greater than 0, it will make a selection in the table of the rows with pending status and the oldest timestamp. Once the row is obtained, the command is passed to the executor object that will execute the command, and in turn, the status of the selected row in the DB will be modified, and its status will change to processing.
The Executor, once it has completed the execution of a command, will change the status of the row that is processed from the SQL table table and set it to done.

(21/06/2024) Working on POC. (24/06/2024) Continued working on POC. Developing classes structure and some use cases. (25/06/2024) Writting document to present management implementation options, capabilities and limitations. (26/06/2024) Uploaded POC.

Update 18/06/2024

I have been reading all the project, all the parent issues, target issues and the like about the project.
I have been reading some Questions and Answers about it.
Researching the actual agentd code and everything that might be related to the topic.
Trying to tackle the list of commands we are going to need for the agent command manager.

Update 19/06/2024

We have been discussing on call many of the points about the context of this project.
Reviewing the code of remoted and agentd to put together a list of possible commands that we are going to need.
Reviewing the json format of the actual commands used in active response and agent upgrade.
Generating a UML schema with the possible flow that the command manager should have.

Update 21/06/2024

I continue to work on the poc code.
Researching and reading more c++ example code from other FIM modules such as the engine.
Trying to organize ideas related to the code we need and the way to receive the events simulating the agent comms API.

Update 25/06/2024

After some discussion with the team, we have decided to investigate the use of databases and text files for command storage.
I am analyzing the pros and cons of text files, I have written a small program that simulates this commander using those text files in a simplified form: commander.txt
Working with the design of classes, necessary methods and solving some doubts about the requirements.

Possible optional design of our Agent command manager, first revision

The current list of possible necessary commands to be considered:

upgradeAgent
restartAgent
applyCentralizedConfiguration
getAgentConfiguration
enrollAgent
executeActiveResponse

Initial format of the JSON containing the command information:

{
    "command": {
        "name": "001",
        "type": "stateless"
    },
    "origin": {
        "serverName": "node01",
        "moduleName": "upgradeModule"
    },
    "parameters": {
        "extra_args": [],
        "error": 0,
        "data": "Upgrade Successful",
        "status": "Done"
    }
}

Class diagram

classDiagram
    class Server {
        +sendCommand(cmd: Command)
    }

    class CommandManager {
        -commandStore: CommandStore
        -executor: Executor
        +storeCommands()
        +executeCommand()
    }

    class CommandStore {
        +storeCommand(cmd: Command)
        +getNextCommand(): Command
        +deleteCompletedCommands(cmd: Command)
    }

    class Executor {
        -feedback: Feedback
        +execute(cmd: Command)
        +generateFeedback(cmd: Command)
        +reportFeedback(): Feedback
    }

    class AgentCommsAPIClient {
        +getConnection()
        +pollCommands()
        +receiveCommands(cmds: List<Command>)
        +addCommands(cmds: List<Command>)
        +sendFeedback(feedback: Feedback)
    }

    class Command {
        +name: String
        +type: String
        +data: String
        +status: String
        +execute()
        +markCompleted()
    }

    class Feedback {
        +status: String
        +message: String
    }

    CommandManager "1" *-- "1" CommandStore
    CommandManager "1" *-- "1" Executor
    CommandManager "1" -- "1" AgentCommsAPIClient
    AgentCommsAPIClient "1..*" -- "1" Server
    CommandStore "1" -- "1..*" Command
    Command "1" -- "0..1" Feedback
    Executor "1" -- "0..1" Feedback

Sequence diagram

sequenceDiagram
    participant Server
    participant AgentCommsAPIClient
    participant CommandManager
    participant CommandStore
    participant Executor
    participant Feedback

    AgentCommsAPIClient->>Server: getConnection()
    AgentCommsAPIClient->>Server: pollCommands()
    Server->>AgentCommsAPIClient: sendCommand()
    AgentCommsAPIClient->>CommandManager: addCommands()
    CommandManager->>CommandStore: storeCommand()
    CommandManager->>Executor: executeCommand()
    Executor->>CommandStore: getNextCommand()
    Executor->>Executor: execute()
    Executor->>Feedback: generateFeedback()
    Executor->>CommandManager: reportFeedback()
    CommandManager->>CommandStore: markCompleted()
    CommandStore->>CommandStore: deleteCompletedCommands()
    CommandManager->>AgentCommsAPIClient: reportFeedback()
    AgentCommsAPIClient->>Server: reportFeedback()

Update 26/06/2024

POC commander

I have been working on a little in c++ to try to approach little by little to the requirements of the issue, in this case, what I have is a Commander class, that executes two threads (pending to analyze the use of routines and coroutines), one of them is in charge of receiving the messages and save them in the store, and the other to process them and pass the feedback, to then delete them from the store: https://github.com/wazuh/wazuh-agent/tree/agent-command-manager

File: commander/commander.cpp

Example of use in the restar-agent case:

root@ubuntu24:/vagrant# ./commander
Command queue loaded from file.
Enter command in JSON format (or type 'quit' to stop): {"command":{"name":"restart-agent","type":"command"},"origin":{"moduleName":"restart","serverName":"node01"},"parameters":{"data":"restart agent","error":0,"extra_args":[],"status":""}}
Command queue saved to file.
Command received and added to the queue.

root@ubuntu24:/vagrant# ./commander
Command queue loaded from file.
Enter command in JSON format (or type 'quit' to stop): 
Processing command: {"command":{"name":"restart-agent","type":"command"},"origin":{"moduleName":"restart","serverName":"node01"},"parameters":{"data":"restart agent","error":0,"extra_args":[],"status":""},"status":"processing"}
Feedback: {"command":{"command":{"name":"restart-agent","type":"command"},"origin":{"moduleName":"restart","serverName":"node01"},"parameters":{"data":"restart agent","error":0,"extra_args":[],"status":""},"status":"processing"},"message":"Command processed successfully.","status":"completed"}
Command queue saved to file.

Update 27/06/2024

After a meeting with the team and Vikman, we have drawn some conclusions to improve the poc, I have updated the code to better contain the dispatch and execute functions: https://github.com/wazuh/wazuh-agent/blob/poc/4-agent-command-manager/poc/commander/commander.cpp

Doing some tests with the code flow we can observe the following:

Entering multiple normal commands, you can see how they accumulate in the file where they are saved, and are processed at the rate of 1+1 second: {"command":{"name":"command2","type":"active-response"},"origin":{"moduleName":"module1","serverName":"node01"},"parameters":{"data":"","error":0,"extra_args":[],"status":"pending"}}
When the command that I introduce is of the restart-agent type, it is introduced in the file, but it is not deleted, the program closes: {"command":{"name":"command1","type":"restart-agent"},"origin":{"moduleName":"restart","serverName":"node01"},"parameters":{"data":"restart agent","error":0,"extra_args":[],"status":"pending"}}
When I open the program again, now the command is deleted because it has the status in processing.

Video testing demonstration, top cmd is program execution, and bottom cmd is for command_queue.txt, the file where the commands for persistence are stored:

https://github.com/wazuh/wazuh-agent/assets/60003131/8891e2b9-d455-46cd-a07c-74881e6ffdcb

wazuh / wazuh-agent

Agent command manager #4