redboltz / mqtt_cpp

Boost Software License 1.0
434 stars 107 forks source link

Client performance benchmarks/comparison to Paho Mqtt #995

Closed mvccogo closed 6 months ago

mvccogo commented 6 months ago

Hello,

Thank you for the library. I've been using Paho MQTT C++ for some time, but I've encountered some bottlenecks in it that prevent me from increasing client throughput (namely, runtime allocation instead of pooling resources and the fact that the underlying C library runs in a single thread, even when I spawn multiple objects).

Before I start migrating the whole project to this library, could anyone help me figure out how fast it can publish data and whether or not the issues above are present? My current use case is to publish a LOT of messages (QoS 0) in a parallel fashion (the messages are already being assembled in a parallel way and sent to an outbound queue).

redboltz commented 6 months ago

Library choice

On the top of mqtt_cpp page:

https://github.com/redboltz/mqtt_cpp

mqtt_cpp has been serious bugfix only status. The new project async_mqtt has been started based on mqtt_cpp's experience. New featureas would be added on async_mqtt.

If you cal use C++17 or later on your environment, please consider async_mqtt. It is well documented and well supported, and the same or perhaps higer performance than mqtt_cpp.

Your use case

Sending QoS0 publish messages parallely could have two bottlenecks typically.

  1. If you use one MQTT client, the underlying socket communication could be a bottleneck because MQTT needs to keep the message order, so the message needs to be serialized.
  2. If you use multiple mqtt clients, the problem 1 is solved. The order of the messages are not guranteed between the sockets. In this case, if for each client mapped to individual thread, then slashing problem would happen. (Like C10K problem)

Both async_mqtt and mqtt_cpp don't use individual thread per client. The clients (endpoints) are designed Boost.Asio asynchronous communication model. So the clients works parallely even if the number of thread is one. In addition, you can run multiple threads to increase the performance.

Bench example

async_mqtt

async_mqtt has bench tool. See https://github.com/redboltz/async_mqtt/blob/main/tool/bench.cpp I guess that it is similar to your usecase.

In addition, here is broker code: https://github.com/redboltz/async_mqtt/blob/main/tool/broker.cpp https://github.com/redboltz/async_mqtt/tree/main/include/async_mqtt/broker

Bench results:

https://github.com/redboltz/async_mqtt/blob/doc/performance.adoc

mqtt_cpp

mqtt_cpp has the same tools:

bench: https://github.com/redboltz/mqtt_cpp/blob/master/example/bench.cpp

broker: https://github.com/redboltz/mqtt_cpp/blob/master/example/broker.cpp https://github.com/redboltz/mqtt_cpp/tree/master/include/mqtt/broker

mvccogo commented 6 months ago

Thank you @redboltz!

Unfortunately I am limited to the C++14 standard for now, but the async_mqtt project looks interesting! Ordering of messages is not required.

So the clients works parallely even if the number of thread is one

I'm probably missing something, but how is this possible? I get that the model can be asynchronous with only one thread, but the messages will, ultimately, be published sequentially in this case, right?

Regardless of the question above, I think that the library does solve my issue; the goal is to use all remaining cores in my machine to saturate the network card with publish messages. These will be sent to an EMQX broker. The broker is not my concern in this case.

redboltz commented 6 months ago

So the clients works parallely even if the number of thread is one

I'm probably missing something, but how is this possible? I get that the model can be asynchronous with only one thread, but the messages will, ultimately, be published sequentially in this case, right?

Perhaps my English would mislead you. I should say concurrently instead of parallely. I mean https://www.boost.org/doc/libs/1_84_0/doc/html/boost_asio/overview/model/async_ops.html

Let's say we are using two sockets on 1 thread. If your application request sending messages A,B,C,D internally, then the following process is worked.

  1. socket1.async_write() is called with A.
  2. socket2.async_write() is called with B before socket1.async_wirte() is finished.
  3. When socket1.async_write() is finished, then socket1.async_write() is called with C before socket2.async_wirte() is finished.
  4. When socket2.async_write() is finished, then socket2.async_write() is called with C before socket1.async_wirte() is finished.

I use before but not always before. Strictly speaking, socket1 and socket2 works independently.

Socket1: A C Socket2: B D

Note: this is one possible scenario example.