pardahlman / RawRabbit

A modern .NET framework for communication over RabbitMq
MIT License
746 stars 144 forks source link

Messages being dropped on retry #337

Open Sharparam opened 6 years ago

Sharparam commented 6 years ago

It seems when using the retry functionality with retry delays over one minute, there is an extremely high loss rate of messages (and from what I've seen in my limited testing, 100% loss rate with delays over 1:30).

The retry queues are created with the correct exchange in the "dead letter exchange" property, but when the TTL expires, the messages seem to just vanish into thin air. If the consumer is stopped before the TTL expires, the messages will arrive to the exchange and end up in the queues as expected, and will be consumed the next time the consumer starts up again (with the retry information intact).

Here is an example program publishing a bunch of messages with increasing retry delays from about a minute to 200 seconds (using Serilog for logging): https://gist.github.com/Sharparam/2625f2f5dce60dfff96494bbaaf83627#file-program-cs

Included is also the log output when running it (it is very wordy, since it logs everything from RawRabbit, but if you search for "ACK <guid> [<timespan>]" you can see that only very few messages have been properly retried).

Am I doing something wrong, or is there an issue somewhere?

This is using RawRabbit 2.0.0-rc5 and RabbitMQ 3.7.2.

pardahlman commented 6 years ago

Hello, @Sharparam - thanks for reaching out.

Do I understand you correctly that the message on the dead letter exchange is republished to the "original" exchange (as expected) when there is no consumer on the "original"? If that's the case, then I believe that the same thing is happening when there is a consumer on the queue, but that it is delivered to the consumer before you have a chance to register it and that it is somehow not delivered to the user-defined consume method. You should be able to verify that the message is delivered from the management API user interface.

From the top of my mind, there is nothing significant with the amount of time to wait that should affect how the message is handled.

This issue needs some in-depth trouble shooting, the first step would probably to extend the existing test suite of RetryLater tests to a test with a longer delay and see if it fails or not.

I'd be happy to assist you if you want to give it a shot!

Sharparam commented 6 years ago

I believe the message is re-published correctly in both cases, but RawRabbit seems to silently discard it. It doesn't even log anything about consuming the message. How would I go about verifying it in the management interface? While the consumer is running, the TTL expires and in the interface it just looks as if the message disappears with the retry queue.

When the TTL expires without RawRabbit subscribing to the queue, the message arrives in the queue (as expected) and stays there waiting for a consumer to consume it.

The test suite could be extended with some tests that use longer retry delays, but it would make the tests take a really long time (several minutes) to run. Maybe that isn't a problem?