Closed TomasTurina closed 1 week ago
There are several C++ libraries that support our requirements. The ones described in this small list are widely used in the industry and tested. Note that no single library covers all needs: those that support high-level protocols like HTTP and HTTP/2 often lack native support for UDP, while libraries handling low-level protocols may not provide comprehensive support for higher-level abstractions.
According to https://github.com/wazuh/wazuh/issues/23395 the server team is also considering a number of libraries, we should encourage sharing the same technology across both Agent and Server.
ZeroMQ (ØMQ): ZeroMQ (ØMQ) is a high-performance asynchronous messaging library designed for distributed and concurrent applications. It supports various messaging patterns like publish/subscribe, request/reply, and client/server over multiple transports, including TCP, UDP, in-process, inter-process, multicast, and WebSocket (does not natively support HTTP/HTTP2). Highly suitable for building scalable and resilient distributed systems. The libzmq library is licensed under the Mozilla Public License 2.0.
https://zeromq.org/get-started/
gRPC: gRPC is a high-performance open-source RPC framework that facilitates seamless communication between client and server applications, making it ideal for distributed systems. It uses HTTP/2 and Protocol Buffers for efficient serialization (does not support raw TCP/UDP). gRPC supports multiple languages, allowing interoperability between different systems. It is designed for low-latency and scalable communication, with features like load balancing, tracing, and authentication. It can also be extended to support various data formats like JSON, Protobuf, and others. It's license is Apache 2.
Boost.Asio / Boost.Beast: Boost.Asio provides a comprehensive suite for asynchronous network programming, supporting TCP, UDP, and serial communication. Boost.Beast extends Boost.Asio with HTTP and WebSocket capabilities. These libraries are highly performant and integrate well with other Boost libraries, offering extensive functionality for complex networking tasks.
https://www.boost.org/users/license.html https://www.boost.org/doc/libs/1_84_0/doc/html/boost_asio.html https://www.boost.org/doc/libs/1_85_0/libs/beast/doc/html/index.html
ASIO (Standalone): The standalone ASIO library offers asynchronous I/O capabilities without the Boost dependency. It supports TCP, UDP, and serial port communications, and is suitable for applications requiring high performance (requires additional libraries for HTTP/HTTP2). ASIO is lightweight and provides efficient network communication primitives.
https://www.boost.org/LICENSE_1_0.txt https://think-async.com/Asio/
CURL and http-request:
CURL supports many protocols like HTTP and HTTPS, with built-in SSL/TLS for secure communication. Given that we already have a CURL wrapper http-request
, continuing to use CURL with it allows leveraging existing infrastructure and avoiding new dependencies, simplifying maintenance.
While it lacks native WebSocket support, CURL's wide support makes it suitable for HTTP-based event sending and command endpoints. Compared to other options, aside from the increased complexity of adding new dependencies ZeroMQ offers high-performance messaging but limited protocol support; gRPC supports HTTP/2 and real-time communication but is more complex given the need for proto buffer definitions. Continuing with CURL leverages existing code and avoids new dependencies.
Note that wazuh/wazuh-http-request
repository requires files from wazuh/wazuh
. Currently, it doesn't work as a standalone project and some minor changes need to be made for it to be used as a library.
https://curl.se/libcurl/ https://github.com/wazuh/wazuh-http-request
jwt-cpp: jwt-cpp is a lightweight library for creating and verifying JSON Web Tokens (JWT). It supports various algorithms for signing and verifying tokens, making it suitable for implementing token-based authentication in C++ applications. jwt-cpp is easy to integrate and provides a straightforward API for handling JWTs. Licensed under MIT.
Initial draft considerations:
Additionally, we should aim to identify functionalities that could be replaced by an external library to offload the maintenance of these components. Furthermore, we need to establish a strategy for common code shared between the agent and server to avoid duplication.
wazuh-agent/
├── CMakeLists.txt
├── vcpkg.json
├── dependencies/ -> may not be necessary
├── modules/
│ ├── agent/
│ │ ├── CMakeLists.txt
│ │ ├── include/
│ │ │ └── [header files...] (include error_messages)
│ │ ├── src/
│ │ │ └── [source files...]
│ │ └── tests/
│ │ ├── CMakeLists.txt
│ │ └── [test files...]
│ ├── active_response/ (former os_execd + active_response)
│ ├── agent_upgrade/
│ ├── aws/
│ ├── azure/
│ ├── docker/
│ ├── gcp/
│ ├── github/
│ ├── ms_graph/
│ ├── office365/
│ ├── logcollector/
│ ├── rootcheck/
│ ├── fim/ (former syscheck)
│ ├── inventory/ (former syscollector)
│ ├── sca/
│ └── [additional modules...]
├── common/
│ ├── data_provider/
│ ├── dbsync/
│ ├── rsync/ -> may not be necessary
│ ├── os_crypto/ -> may not be necessary
│ ├── os_net/ -> may not be necessary
│ ├── os_regex/ -> may not be necessary
│ ├── os_xml/ -> may not be necessary
│ ├── os_zlib/ -> may not be necessary
│ ├── shared/ (include headers)
│ ├── utils/ (from shared_modules)
│ ├── commonDefs.h (from shared_modules/common)
│ └── [additional modules...]
└── build/
└── [build output...]
For folders that is not code related to the Agent:
wazuh-agent/
├── ci/
├── etc/
│ ├── config/
│ ├── selinux/
│ └── ruleset/
│ ├── sca/
│ └── rootcheck/
├── packages/
├── tools/
└── installers/
├── unix/ (former init folder, including upgrade.sh and install.sh)
└── win32/
Missing from this initial draft are other files such as .gitignore
, .clang-format
, .clang-tidy
, etc.
Opción 1:
wazuh-agent/
├── CMakeLists.txt
├── vcpkg.json
├── ci/
├── common/
├── dependencies/ -> may not be necessary
├── etc/
├── installers/
├── modules/
├── packages/
└── tools/
Opción 2:
wazuh-agent/
├── ci/
├── etc/
├── installers/
├── packages/
├── src/
│ ├── CMakeLists.txt
│ ├── vcpkg.json
│ ├── common/
│ ├── dependencies/ -> may not be necessary
│ └── modules/
└── tools/
[!Note] This is a work in progress
Update: 03/06/2024
Update: 04/06/2024
Update: 05/06/2024
/commands
endpoint using HTTP Long Polling.Update: 06/06/2024
/commands
endpoint using HTTP Long Polling. A timeout response mechanism was implemented to simulate real behavior. Now I'm working on the storage of the commands.Update: 07/06/2024
Update: 10/06/2024
Update: 11/06/2024
Update: 13/06/2024
After an initial period of research, during which various library variants were evaluated for the Proof of Concept (PoC) implementation (see comment), it was determined that the initial approach for integrating the new agent with the Agent Communications API would use the wazuh-http-request repository as a cURL wrapper.
The PoC aims to address the fundamental aspects of communication and design, based on the component diagrams detailed in this issue. This PoC specifically targets the client module and the submodules that interact with the commander and the queue of that design.
The following requests specified in the description of this issue were successfully implemented:
POST /login
POST /stateless
GET /commands
The POST /stateful
request was not implemented, as it is assumed to be analogous to the stateless request.
For data persistence, two basic storage models were developed: one using SQLite and the other using RocksDB. These were implemented with a wrapper to abstract the design from the specific database chosen in the future. Currently, command storage utilizes RocksDB, while event storage employs SQLite.
A server was developed to mock responses, using the Boost library and JWT to provide the client with a token via the /login request.
Regarding events, their structure is defined as follows:
The status of requests can be set to: pending, processing, or dispatched. By default, they are inserted into the database with a pending status.
Events are generated automatically through a Python script. The event_queue_monitor submodule extracts these events from the database, updates their status to "processing," and then accumulates them for a time T or until a count N is reached. Subsequently, it automatically sends a /stateless request with these events to the server. This submodule verifies the success of the request, marking the events as dispatched in the database or resetting their status to pending.
This submodule features a thread that continuously searches for events in the database. For each batch of events, a new thread is launched to handle dispatch.
Regarding commands, their structure is defined as follows:
The status can be set to either pending or dispatched.
The command_dispatcher submodule is responsible for making continuous GET requests to the server within a thread. When no commands are available, a Timeout error is returned. If commands are available, they are received and stored in the database. Another thread then retrieves commands with a pending status from the database, marks them as dispatched, and simulates sending them to the commander.
Finally, for both submodules, a message format already used by wazuh in the upgrade module was selected (see issue). Particularly for these cases an example of an event would be as follows:
event_data = {
"origin": {
"module": "fim"
},
"command": "create_event",
"parameters": {
"data": "event # xxx"
}
}
and a command would be:
command = {
"origin": {
"module": "upgrade_module"
},
"command": "upgrade_update_status",
"parameters": {
"agents": [20],
"error": 0,
"data": "Upgrade Successful",
"status": "Done"
}
}
The following diagram was made by @TomasTurina for a better understanding:
The purpose of these benchmarks is to evaluate the performance of sending pre-loaded pending events using std::thread
. The benchmarks create a instance of the client using real tcp request and a real SQLite DB.
The results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis. 100 iterations of each test were made to average the results.
Batch Size | Time (manual_time) |
---|---|
1 | 2572 ms |
10 | 1068 ms |
100 | 830 ms |
1,000 | 856 ms |
Analysis: As the batch size increases, the time required for dispatching events significantly decreases initially and then stabilizes. This demonstrates that larger batch sizes are more efficient for dispatching a high number of events.
Event Count | Time (manual_time) |
---|---|
1 | 2.66 ms |
10 | 10.5 ms |
100 | 84.7 ms |
1,000 | 746 ms |
Analysis: The time required increases with the number of events, which is expected.
Event Data Size | Time (manual_time) |
---|---|
1 | 288 ms |
10 | 283 ms |
100 | 260 ms |
1,000 | 682 ms |
10,000 | 694 ms |
100,000 | 1432 ms |
Analysis: As the data size increases, the time required for dispatching events increases too. This demonstrates that the request take more time, indicating a potential limit for optimization when handling very large data sizes.
It will be necessary in successive versions of the client and the server to establish an optimal number for the batch size according to the average or maximum size of the events. Although a larger batch is more efficient, when using the whole communication stack and the database, the times get worse for large volumes of information.
The same tests were repeated with AsioThreadManager
instead of std::thread
. The results are almost the same. Only small differences were found.
Batch Size | Time (manual_time) |
---|---|
1 | 2577 ms |
10 | 1048 ms |
100 | 807 ms |
1,000 | 841 ms |
Event Count | Time (manual_time) |
---|---|
1 | 2.84 ms |
10 | 12.3 ms |
100 | 103 ms |
1,000 | 852 ms |
Event Data Size | Time (manual_time) |
---|---|
1 | 296 ms |
10 | 286 ms |
100 | 272 ms |
1,000 | 676 ms |
10,000 | 709 ms |
100,000 | 1632 ms |
Commit https://github.com/wazuh/wazuh-agent/commit/805436ae20d7ec646a525681fe1f261d07050a11 adds the following benchmark tests to the PoC
The purpose of these benchmarks is to evaluate the performance of dispatching pre-loaded pending events using std::thread
. The benchmarks have abstracted away a real database and a real connection to a server to which the events are dispatched. Instead, the focus is solely on measuring the time taken to create threads that batch and send the events. One of our objectives is to assess the efficiency of using std::thread
.
The results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis. 100 iterations of each test were made to average the results.
Batch Size | Time (manual_time) |
---|---|
10 | 342 ms |
100 | 44.6 ms |
1,000 | 16.6 ms |
10,000 | 13.9 ms |
100,000 | 14.4 ms |
200,000 | 14.7 ms |
Analysis: As the batch size increases, the time required for dispatching events significantly decreases initially and then stabilizes. This demonstrates that larger batch sizes are more efficient for dispatching a high number of events.
Event Count | Time (manual_time) |
---|---|
10 | 0.022 ms |
100 | 0.032 ms |
1,000 | 0.185 ms |
10,000 | 2.01 ms |
100,000 | 21.4 ms |
200,000 | 43.9 ms |
300,000 | 64.1 ms |
500,000 | 106 ms |
1,000,000 | 213 ms |
Analysis: The time required increases with the number of events, which is expected. However, the increase is more pronounced as the event count grows, indicating a potential area for optimization when handling very large volumes of events.
Event Data Size | Time (manual_time) |
---|---|
10 | 21.4 ms |
100 | 21.0 ms |
1,000 | 21.1 ms |
10,000 | 20.9 ms |
100,000 | 21.1 ms |
250,000 | 21.2 ms |
500,000 | 21.0 ms |
1,000,000 | 20.8 ms |
Analysis: The dispatch time remains relatively constant regardless of the event data size, suggesting that the current implementation efficiently handles varying sizes of event data.
The benchmark results provide valuable insights into the performance characteristics of using std::thread
for event dispatching. The use of std::thread
demonstrates efficient handling of varying batch sizes, event counts, and event data sizes. Keeping in mind that there is no DB and server connection involved in these benchmarks, the key takeaways include:
Overall, the results are good and shed a positive light on the use of std::thread
for this purpose. These findings support the consideration of std::thread
as a viable alternative to other libraries like libuv
or the C++20 coroutines for event dispatching tasks. Further optimization and testing, including the management of thread creation, can help enhance performance, particularly for very large event counts.
It's also worth noting that given these measurements results, the true bottleneck in handling events will more likely come from database transactions and http requests to the server.
The purpose of these benchmarks is to evaluate the performance of dispatching pre-loaded pending events using std::thread
and compare it with using Asio's thread pool. The benchmarks have abstracted away a real database and a real connection to a server to which the events are dispatched. Instead, the focus is solely on measuring the time taken to create threads that batch and send the events. One of our objectives is to assess the efficiency of using std::thread
in comparison to Asio's thread pool.
Asio's thread pool has been limited to 32 threads (the result of std::thread::hardware_concurrency()
on the machine running the tests).
The results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis.
Using std::thread
:
Batch Size | Time (manual_time) |
---|---|
10 | 342 ms |
100 | 44.6 ms |
1,000 | 16.6 ms |
10,000 | 13.9 ms |
100,000 | 14.4 ms |
200,000 | 14.7 ms |
Using Asio's Thread Pool:
Batch Size | Time (manual_time) |
---|---|
10 | 47.4 ms |
100 | 17.0 ms |
1,000 | 14.2 ms |
10,000 | 13.7 ms |
100,000 | 15.3 ms |
200,000 | 18.2 ms |
Analysis: Asio's thread pool performs significantly better than std::thread
for smaller batch sizes (10 and 100). For larger batch sizes, the performance of Asio's thread pool is comparable to std::thread
, with Asio showing slightly better efficiency for batch sizes of 1,000 and 10,000.
Using std::thread
:
Event Count | Time (manual_time) |
---|---|
10 | 0.022 ms |
100 | 0.032 ms |
1,000 | 0.185 ms |
10,000 | 2.01 ms |
100,000 | 21.4 ms |
200,000 | 43.9 ms |
300,000 | 64.1 ms |
500,000 | 106 ms |
1,000,000 | 213 ms |
Using Asio's Thread Pool:
Event Count | Time (manual_time) |
---|---|
10 | 0.376 ms |
100 | 0.269 ms |
1,000 | 0.385 ms |
10,000 | 1.27 ms |
100,000 | 9.01 ms |
200,000 | 17.1 ms |
300,000 | 25.5 ms |
500,000 | 43.4 ms |
1,000,000 | 88.3 ms |
Analysis: Asio's thread pool significantly outperforms std::thread
across all event counts, showing better scalability and efficiency as the number of events increases.
Using std::thread
:
Event Data Size | Time (manual_time) |
---|---|
10 | 21.4 ms |
100 | 21.0 ms |
1,000 | 21.1 ms |
10,000 | 20.9 ms |
100,000 | 21.1 ms |
250,000 | 21.2 ms |
500,000 | 21.0 ms |
1,000,000 | 20.8 ms |
Using Asio's Thread Pool:
Event Data Size | Time (manual_time) |
---|---|
10 | 9.31 ms |
100 | 9.52 ms |
1,000 | 9.30 ms |
10,000 | 9.16 ms |
100,000 | 9.79 ms |
250,000 | 10.1 ms |
500,000 | 9.94 ms |
1,000,000 | 9.95 ms |
Analysis: Asio's thread pool demonstrates a more consistent and lower dispatch time across different event data sizes compared to std::thread
.
The benchmark results provide valuable insights into the performance characteristics of using std::thread
compared to Asio's thread pool for event dispatching. Key takeaways include:
std::thread
by a substantial margin.std::thread
does not limit the number of threads being created. Proper thread management, according to hardware capabilities and other fine-tuning decisions, is essential for optimal performance. Asio's thread pool inherently manages threads more effectively, leading to better performance.Overall, Asio's thread pool demonstrates superior performance and scalability compared to a manual handling of std::thread
s. These findings support the consideration of Asio's thread pool as a more efficient alternative for event dispatching tasks. Further optimization and testing, including fine-tuning the number of threads and other parameters, can help enhance performance, particularly for very large event counts.
The test focused on basic database operations — writing, reading, and updating records — without utilizing multi-threading aiming to provide insights into the efficiency and suitability of each database for potential use in our project. Default configurations for both RocksDB and SQLite were used. The test consisted in writing, reading and updating 10000 elements.
Using transactions for bulk inserts dramatically improves SQLite performance.
Events | Time |
---|---|
10000 | 26ms |
100000 | 143 |
There's no meaningful difference in terms of performance between SQLite and RocksDB.
libcurl
, due to its simplicity and because it covers all the use cases we need.We are ready to move forward with the implementation of the MVP.
Description
As detailed in https://github.com/wazuh/wazuh/issues/22677, Wazuh's current communication setup is complex and needs to be refactored.
We want to replace the current
wazuh-agentd
service in charge of communicating with the server with a new agent. Additionally, theagent-auth
tool will also be replaced by this agent.The new agent must be able to perform the following tasks:
Server management API
(it needs the login token).Agent comms API
.Agent comms API
.Agent comms API
.Additionally, these are the API endpoints that the agent will use to communicate with the server:
/login
Authenticate (request token)./events/stateless
Send events./events/stateful
The same as the previous one, but it requires persistent data./commands
In opposite direction, request made by agent to manager.The focus of this issue will be on the following tasks:
Implementation restrictions
libcurl
,boost
,gRPC
, etc.Plan
/poc
folder in the new repository.POC working branch: https://github.com/wazuh/wazuh-agent/tree/1-spike-new-agent-comms-api-endpoint-client