All messages send, only 20% received

mkb-mb commented 2 months ago

🐛 Bug Report

🔬 How To Reproduce

Steps to reproduce the behavior:

Compile and run the attached code

Code sample

See attached code.

Environment

Where are you running/using this client?

What version of this client are you using? 0.23.0+build.618

Operating System? Windows 10

Hardware or Device? Dell laptop

.NET version: 6

Screenshots

Console output:

Connected result: Success Subscribed result: GrantedQoS1 - Notification/HiveRawTest Connect: 1 Disconnect: 0 Sent: 50000 Failed send: 0 Unique Received: 5765 Received: 5765 Duplicates: 0 Mismatch: 0 Connect: 1 Disconnect: 0 Sent: 50000 Failed send: 0 Unique Received: 10756 Received: 10756 Duplicates: 0 Mismatch: 0 Connect: 1 Disconnect: 0 Sent: 50000 Failed send: 0 Unique Received: 10756 Received: 10756 Duplicates: 0 Mismatch: 0

📈 Expected behavior

I am publishing 50,000 messages (on a single topic) as quickly as possible. I have a subscription (setup on during connect) that should catch all messages.

The payload for each message is minimal (4 bytes).

I receive less than 11,000 message.

I expected to receive all QoS 1 messages that are published

If I do not try to publish messages concurrently all messages are (usually) received but it takes much longer.

📎 Additional context

Program.cs.txt

github-actions[bot] commented 2 months ago

Hello @mkb-mb, thanks for contributing to the HiveMQ community! We will respond as soon as possible.

pglombardo commented 2 months ago

Hi @mkb-mb - which broker did you use?

mkb-mb commented 2 months ago

Mosquitto 2.0.14 but I have tried others. The mosquitto support team at Cedalo thinks it’s a client issue. I have disabled max-inflight and increased max-queued messages to 10,000.

mkb-mb commented 2 months ago

I have also tried other clients.

pglombardo commented 1 month ago

Hi @mkb-mb - I haven't been able to reproduce the failure. I also used Wireshark to confirm messages sent/received.

I used the HiveMQ broker and set the broker topic queue to 1m.

Screenshot 2024-10-07 at 11 11 10

I don't doubt your issue but it could be due to a large number of potential causes: ...not all brokers respect the in-flight settings, don't queue messages, queue is not large enough, network latency etc... See also this little protocol nuance.

But in your case, something is getting overloaded and messages lost. Here are a few items to diagnose:

Try with another broker
Can you check dropped messages in the broker?

But before going down that path, the larger question is what are you trying to achieve?

The general rule for MQTT is that multiple clients will always get better performance than a single client - not to mention the risk of having a single point of failure.

Are you attempting to simply benchmark? If so, we do have benchmarks published.

Let me know and then we can figure out the best path forward.

mkb-mb commented 1 month ago

I appreciate the prompt response and feedback.

The original problem was that a specific type of message was being sent but not all messages were received. The code I shared was my attempt to recreate that problem with a smaller set of dependencies and the least code possible.

Normally, we use the broker to support communications between a few applications, but we also use the broker to communicate between software components within a single application. We've been using a single client instance in each application. The lost messages problem started to appear when we had a publisher send a bunch of messages (hundreds) on the same topic within a small time period (as quickly as possible) mixed in with our other MQTT traffic. I can't really share that application code.

We have found different problems under different conditions (with different 3rd party software). With Mosquitto 2.14 broker and MQTTnet client library we found that we could trigger an error (Received packet 'PubAck: [PacketIdentifier=5545] [ReasonCode=Success]' at an unexpected time.) by issuing concurrent calls to the client library "PublishAsync" method.

Preventing the concurrent calls to PublishAsync seemed to eliminate that error, but we are concerned that this approach is creating a communications bottleneck (we do see outgoing messages waiting to be sent). My assumption is that the MQTTnet "PublishAsync" method is not fully thread-safe.

We have tried upgrading Mosquitto (2.18a) and MQTTnet. I briefly tried self-hosting a broker (using MQTTnet).

I have contacted the Cedalo and the MQTTnet teams. Cedalo felt the problem was in the client code. MQTTnet did not respond.

I then tried the HiveMQ client and a similar problem emerged (messages not received) even at a fraction of the messages it took to recreate a problem with MQTTnet.

I have (at times) enabled logging on the broker but I do not see "dropped message" warnings (unless we run with a much lower max_queued_messages setting).

Would you recommend creating multiple clients within each application? That might help us to publish messages rapidly without a bottleneck.

mkb-mb commented 1 month ago

I increased max_queued_messages from 10,000 to 250,000 and all 50,000 messages were delivered (if I publish consecutively, not concurrently). Publishing concurrently I get "No available packet IDs" error and only one message sent/received.

With HiveMQ CE (default settings) I see the same error (no available packet ids) when I try to send 100,000 messages (with concurrent publish).

pglombardo commented 1 month ago

That's excellent news on the max_queued_messages - one problem solved at least.

As for "No available packed IDs" that is definitely the fault of this client. For the Packet IDs, I have set a max of 65k that get reserved and released on a rotation as messages get sent. It's the first time I've seen that limit hit :-)

I'll can increase that count limit 2x and release a new version today likely.

pglombardo commented 1 month ago

I responded too quickly. The MQTT specification defines a Packet ID to be a 2 byte unsigned integer which makes the max size of the Packet ID pool 65,535.

Which then means, you have to wait for some messages to be acknowledged before sending more. For example, with QoS 1, once the PubAck is received for a message, the ID is re-released back to the pool.

A couple other items of note:

When you mass launch publishes, they are rate limited by the client and broker by in-flight flow control and don't all get sent concurrently.
Packet IDs are only used with Publish QoS 1 or 2. See this section in the spec.

These are good items to note. I hope I've explained everything well-enough.

pglombardo commented 1 month ago

The original problem was that a specific type of message was being sent but not all messages were received. ... The lost messages problem started to appear when we had a publisher send a bunch of messages (hundreds) on the same topic within a small time period (as quickly as possible) mixed in with our other MQTT traffic.

If this is the case, I would consider QoS 2 with a healthy queue size on the broker. Then also check the ReceiveMaximum setting both on the client and broker in case of message bursts like you described above.

Re: QoS 2:

This is the highest Quality of Service level, for use when neither loss nor duplication of messages are acceptable.

In QoS 2, the transaction doesn't complete until the subscriber acknowledged receipt of the message.

mkb-mb commented 1 month ago

Hi Peter, I appreciate all the background. I have tried QoS 2 for these messages, but there was a fairly significant throughput penalty.

I guess the question I am left with is how can I increase message throughput (we can focus on publishing) if I cannot have multiple threads calling "PublishAsync" on a single client without triggering errors? With the HiveMQ client I run the risk of hitting the "no packet id" error if I allow enough concurrent calls to PublishAsync to deplete the 65k packet identifiers. With the MQTTnet client I often see the error "Received packet 'PubAck: [PacketIdentifier=5545] [ReasonCode=Success]' at an unexpected time." if I allow a large number of concurrent calls to PublishAsync.

I can create multiple client objects inside the publishing application and route half the traffic to each client, but it's not clear to me whether that will (or even could) increase message throughput.

What I'm doing now allows only one call to PublishAsync at a time and that is definitely limiting my message throughput. I'm working under the assumption that MQTT should be able to handle tens of thousands of messages per second, but that's not the behavior I'm observing.

Thanks again

pglombardo commented 1 month ago

With the HiveMQ client I run the risk of hitting the "no packet id" error if I allow enough concurrent calls to PublishAsync to deplete the 65k packet identifiers.

One clarification: This isn't specific to this client - it's a protocol limit. If another client doesn't show this error after exhausting all packet IDs, then that client will have bigger hard to track problems down the road.

The number of Packet IDs is equivalent to ReceiveMaximum:

The value of Receive Maximum applies only to the current Network Connection. If the Receive Maximum value is absent then its value defaults to 65,535.

Both client and broker have a ReceiveMaximum setting. This is the "in-flight" setting and it "applies only to the current Network Connection".

I can create multiple client objects inside the publishing application and route half the traffic to each client, but it's not clear to me whether that will (or even could) increase message throughput.

Multiple clients will get a higher throughput. If you go to the extreme, the HiveMQ broker benchmark hit a peak message throughput of 1 million publish messages per second last year.

From my time at HiveMQ, the case has always been that multiple clients get better performance than a single client. I would at least test the multiple client route.

If you really want deep expert advice, we have the team here that works on these problems every day for massively large deployments in critical architectures. You could always contact HiveMQ for something like that if you think it's warranted.

The related page is here: https://www.hivemq.com/company/services/

Either way, let me know. I'm be happy to help out further if it's needed.

mkb-mb commented 1 month ago

Just to be clear about "multiple clients" providing higher throughput. Do you mean a) multiple client applications (with one HiveMQClient instance) or b) multiple instances of HiveMQClient in a single application? I can easily construct multiple HiveMQClient instances in my applications. Dividing the applications would be challenging.

This has been extremely helpful. Thanks again.

pglombardo commented 1 month ago

Definitely b:

b) multiple instances of HiveMQClient in a single application?

It really depends on your application for the final implementation but using an array of 3 instantiated clients and calling on them round-robin style would be a good low effort test.

You could also put an in-memory queue (or redis) and clients in an external process pull off the queue to send. This is a bit much to prove the performance though.

This has been extremely helpful. Thanks again.

Anytime/my pleasure - hoping that you find a good outcome.

hivemq / hivemq-mqtt-client-dotnet