vikman90 commented 1 week ago

Parent issue:

https://github.com/wazuh/wazuh/issues/22677

Description

Following the completion of the spike on agent-manager communication, the next step is to implement a functional client for this new protocol. This client will establish a fully functional communication and handshake with the server, transmit queue information, and leave the interface open for the addition of other modules.

Functional requirements

Connect to the server.
Register and obtain a UUID.
Authenticate and acquire a token.
Send data using that token. Let such data persist in the agent until it receives a 200 code from the server. a. Stateful. b. Stateless.
Query commands that the server needs to send to the agent.
Resilience: reconnect to the server if the connection is lost.
Multitasking: support multiple concurrent connections.
Read options from a configuration file.
Work alongside the queue system.

Implementation restrictions

We will adhere to the restrictions declared in issue #1.
The configuration will be in TOML format.
The configuration will be parsed in a separate component, such as ConfigParser.
Support for multiple communications will preferably be implemented without multithreading, using coroutines or similar.
If possible, use C++20.

Plan

Start from the PoC developed in issue #1.
Agree on the queue interface: #2.
Agree on the command manager interface: #4.
Investigate if all target platforms support C++20 with GCC and Clang.
Design and implement a configuration manager.
Integrate with the migration of the Inventory module (MVP at https://github.com/wazuh/wazuh/issues/22887).
If necessary, implement a "Dummy" module for testing purposes.

Subtasks

aritosteles commented 1 week ago

Compatibility with gcc 10

The following OS have been tested and they either have a package for gcc10 or gcc10 has been built successfully.

Debian 10
Ubuntu 16.04
Amazon Linux 1
openSUSE 15
SUSE 15 - there are packages for gcc12 and gcc13
MacOS 14
AlmaLinux 8
Rocky Linux 8
RHEL 6
CentOS 6

vikman90 commented 1 week ago

C++20 concepts vs inheritance

I've conducted a trial to model a module pool in C++, aiming to create a class capable of managing references to all modules, starting them, and facilitating communication.

To achieve this, I explored two approaches:

Inheritance: Introducing an IRunnable interface where modules inherit and implement it.
Concepts: Utilizing a Runnable concept, with the pool restricting parameters to this concept.

Initially, conceptual constraints seemed preferable over inheritance to ensure each module remains independent of the "Module" definition. However, I noted that using a std::vector<std::any> container — akin to the classic void*[] in C — introduces excessive generality. This approach necessitates runtime type inference for each module access operation, thereby:

Missing compile-time error detection.
Incurring slight runtime overhead for type inference, contrary to our aim of avoiding vtables from inheritance.

Moreover, relying on concepts does not entirely decouple modules from the Agent component, as they still need to interact via the Pool if inter-module communication is desired.

Code

inheritance.cpp

```cpp #include #include #include class IRunnable { public: virtual void run() = 0; virtual ~IRunnable() = default; }; class LogCollector : public IRunnable { public: void run() override { std::cout << "LogCollector is running" << std::endl; } }; class FIM : public IRunnable { public: void run() override { std::cout << "FIM is running" << std::endl; } }; class Pool { public: void addRunnable(std::shared_ptr runnable) { runnables.push_back(runnable); } void executeAll() { for (auto& runnable : runnables) { runnable->run(); } } private: std::vector> runnables; }; int main() { auto logcollector = std::make_shared(); auto fim = std::make_shared(); Pool pool; pool.addRunnable(logcollector); pool.addRunnable(fim); pool.executeAll(); return 0; } ```

concept.cpp

```cpp #include #include #include #include #include template concept Runnable = requires(T t) { { t.run() } -> std::same_as; }; class LogCollector { public: void run() { std::cout << "LogCollector is running" << std::endl; } }; class FIM { public: void run() { std::cout << "FIM is running" << std::endl; } }; class Pool { public: template void addRunnable(T obj) { runnables.push_back(std::make_any(std::move(obj))); } void executeAll() { for (auto& a : runnables) { try { if (a.type() == typeid(LogCollector)) { std::any_cast(a).run(); } else if (a.type() == typeid(FIM)) { std::any_cast(a).run(); } else { std::cerr << "ERROR: Incompatible element." << std::endl; } } catch (const std::bad_any_cast&) { std::cerr << "ERROR: Incompatible element." << std::endl; } } } private: std::vector runnables; }; int main() { Pool pool; LogCollector logcollector; FIM fim; pool.addRunnable(logcollector); pool.addRunnable(fim); pool.executeAll(); return 0; } ```

Conclusion

Inheritance	Concepts
Pros	Simpler container setup. Agent independence from individual module definitions.	Module definition detached from Agent during instantiation.
Cons	Each module inherits `Module`, causing vtable overhead. Dependency of Agent on each module.	Agent dependency on each module persists.

If my analysis holds true, IMHO, I'm inclined towards using inheritance.

jr0me commented 1 week ago

Update: Compatibility with gcc 10

Fedora 40 :green_circle:

The process to compile gcc10 on Fedora 40 was not straight forward, gcc 10.5 worked but not previous versions. It was necessary to compile GCC with options --enable-version-specific-runtime-libs. Then the example code with coroutines also needed the additional option -static-libgcc as the linker stage failed to find libgcc.

GCC14 is available out of the box nonetheless.

vikman90 commented 1 week ago

C++20 concepts vs inheritance (update)

I have further developed a new proposal for the approach using concepts: I introduced wrappers for each module. The addModule() function, which creates the wrappers, is a template function that uses concepts. This way, we achieve decoupling each module class from the module definition.

However, in this proposal, each module must receive a Configuration object to establish its configuration. This object must be created by some configuration parser.

Code

Below, I present the two updated approaches with exactly the same behavior:

inheritance.cpp using an abstract base class

```cpp #include #include #include #include using namespace std; class Configuration { }; struct IModule { public: virtual ~IModule() = default; virtual void run() = 0; virtual void stop() = 0; virtual int setup(const Configuration& config) = 0; virtual string command(const string & query) = 0; virtual string name() const = 0; }; /******************************************************************************/ struct Logcollector : public IModule { void run() { cout << "+ [Logcollector] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [Logcollector] stopped" << endl; } string command(const string & query) { cout << " [Logcollector] query: " << query << endl; return "OK"; } string name() const { return "logcollector"; } }; struct FIM : public IModule { void run() { cout << "+ [FIM] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [FIM] stopped" << endl; } string command(const string & query) { cout << " [FIM] query: " << query << endl; return "OK"; } string name() const { return "fim"; } }; struct Inventory : public IModule { void run() { cout << "+ [Inventory] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [Inventory] stopped" << endl; } string command(const string & query) { cout << " [Inventory] query: " << query << endl; return "OK"; } string name() const { return "inventory"; } }; struct SCA : public IModule { void run() { cout << "+ [SCA] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [SCA] stopped" << endl; } string command(const string & query) { cout << " [SCA] query: " << query << endl; return "OK"; } string name() const { return "sca"; } }; /******************************************************************************/ class Pool { public: Pool() { addModule(make_shared()); addModule(make_shared()); addModule(make_shared()); addModule(make_shared()); } void addModule(shared_ptr module) { modules[module->name()] = module; } shared_ptr getModule(const string & name) { return modules.at(name); } void start() { for (const auto &[_, module] : modules) { module->run(); } } void setup(const Configuration & config) { for (const auto &[_, module] : modules) { module->setup(config); } } void stop() { for (const auto &[_, module] : modules) { module->stop(); } } private: map> modules; }; int main() { Pool pool; Configuration config; pool.start(); pool.setup(config); cout << endl; try { auto logcollector = pool.getModule("logcollector"); logcollector->command("Hello World!"); } catch (const out_of_range & e) { cerr << "! OOPS: Module not found." << endl; } cout << endl; pool.stop(); return 0; } ```

wrapper.cpp using a wrapper and concepts

```cpp #include #include #include #include #include using namespace std; class Configuration { }; /******************************************************************************/ struct Logcollector { void run() { cout << "+ [Logcollector] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [Logcollector] stopped" << endl; } string command(const string & query) { cout << " [Logcollector] query: " << query << endl; return "OK"; } string name() const { return "logcollector"; } }; struct FIM { void run() { cout << "+ [FIM] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [FIM] stopped" << endl; } string command(const string & query) { cout << " [FIM] query: " << query << endl; return "OK"; } string name() const { return "fim"; } }; struct Inventory { void run() { cout << "+ [Inventory] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [Inventory] stopped" << endl; } string command(const string & query) { cout << " [Inventory] query: " << query << endl; return "OK"; } string name() const { return "inventory"; } }; struct SCA { void run() { cout << "+ [SCA] is running" << endl; } int setup(const Configuration & config) { return 0; } void stop() { cout << "- [SCA] stopped" << endl; } string command(const string & query) { cout << " [SCA] query: " << query << endl; return "OK"; } string name() const { return "sca"; } }; /******************************************************************************/ template concept Module = requires(T t, const Configuration & config, const string & query) { { t.run() } -> same_as; { t.setup(config) } -> same_as; { t.stop() } -> same_as; { t.command(query) } -> same_as; { t.name() } -> same_as; }; struct ModuleWrapper { function run; function setup; function stop; function command; }; class Pool { public: Pool() { addModule(make_shared()); addModule(make_shared()); addModule(make_shared()); addModule(make_shared()); } template void addModule(shared_ptr module) { auto wrapper = make_shared(ModuleWrapper{ .run = [module]() { module->run(); }, .setup = [module](const Configuration & config) { return module->setup(config); }, .stop = [module]() { return module->stop(); }, .command = [module](const string & query) { return module->command(query); } }); modules[module->name()] = wrapper; } shared_ptr getModule(const string & name) { return modules.at(name); } void start() { for (const auto &[_, module] : modules) { module->run(); } } void setup(const Configuration & config) { for (const auto &[_, module] : modules) { module->setup(config); } } void stop() { for (const auto &[_, module] : modules) { module->stop(); } } private: map> modules; }; /******************************************************************************/ int main() { Pool pool; Configuration config; pool.start(); pool.setup(config); cout << endl; try { auto logcollector = pool.getModule("logcollector"); logcollector->command("Hello World!"); } catch (const out_of_range & e) { cerr << "! OOPS: Module not found." << endl; } cout << endl; pool.stop(); return 0; } ```

Conclusion

In conclusion, here is the updated table of pros and cons:

Inheritance	Wrapper (concepts)
Pros	The agent can transparently receive modules.	Modules are developed independently from the module pool.
Cons	Dependency on the base class. Slight runtime overhead due to vtable usage.	The pool must maintain each module. Slight compilation overhead due to templates, and increased binary size.

Tremendous thanks to @gdiazlo and @dwordcito for their collaboration in this research.

jr0me commented 4 days ago

Update on implementation restriction no. 4

After investigating the implementation restriction of supporting multiple communications without multithreading, I explored the use of coroutines. Coroutines allow for non-blocking operations by suspending and resuming execution, which can efficiently manage asynchronous tasks. However, achieving true multitasking and supporting multiple concurrent connections may still require an underlying mechanism, such as an event loop or a thread pool, to effectively handle these connections concurrently.

Here's a test using cppcoro (https://github.com/lewissbaker/cppcoro): https://github.com/wazuh/wazuh-agent/commit/726a2997840a88eb32fdba752a17650153e39d3a which implements a task pool using coroutines and threads.

wazuh / wazuh-agent

Develop the new client #14

Description

Functional requirements

Implementation restrictions

Plan

Subtasks

Compatibility with gcc 10

C++20 concepts vs inheritance

Code

Conclusion

Update: Compatibility with gcc 10

C++20 concepts vs inheritance (update)

Code

Conclusion

Update on implementation restriction no. 4