reject-publish ignored by Dead-Letter quorum queues

leadenmoth commented 1 year ago

Describe the bug

RabbitMQ 3.11.11, Erlang 25.3, deployed in AKS via cluster-operator 2.2.0

Setting max-length: <int> and overflow: reject-publish policy on a quorum dead letter queue leads to queue continuously filling beyond max-length (I stopped the test at 25000 messages with max-length:10). According to the docs,

When a quorum queue reaches the max-length limit and reject-publish is configured it notifies each publishing channel who from thereon will reject all messages back to the client.

My understanding is that pushing a dead message from main queue to dead letter exchange and then to bound dead letter queue is done entirely on the server side?.. If so, my best guess is that whatever does that publishing (some internal publisher?..) ignores the reject-publish notification. Setting drop-head (well, unsetting reject-publish and letting it default to drop-head) instead works perfectly.

Reproduction steps

Set up normal and DL exchanges, set up normal and DL queues of type quorum, set queue-exchange bindings, set x-dead-letter-exchange, x-dead-letter-routing-key and x-delivery-limit on normal queue
Set up a max-length and reject-publish policy on the DL queue
Publish significantly more messages than max-length to normal queue
Consume and reject messages from normal queue continuously until x-delivery-limit is reached and the messages start moving to DL queue
Observe that all messages are moved to DL queue, regardless of max-length ...

Expected behavior

Messages to-be-dead-lettered above the max-length limit should be discarded when reject-publish is set

Additional context

Context: we have a legacy system prone to causing message pile-ups and going above the high RAM watermark for good (less so after 3.8->3.11 upgrade, due to all the improvements to quorum behavior). Most queues are quorum and have DL counterparts (also quorum) with a single dead letter exchange and routing-key -> binding routing. Message pileups eventually end up in corresponding dead letter queues, still occupying RAM (or HDD, since the 3.11 upgrade). In an effort to free up resources I was trying to limit the size of DL queues. Since there's no process for dealing with DL messages at all (again, legacy system, someone decided having DL would be good, but never got around to using it), the knee-jerk reaction would be to just kill the DL queues. I thought I'd instead keep them for human review with just a few messages, but, crucially, the earliest messages that failed. Sure, failure could be logged elsewhere (and it is), but I thought having all the RabbitMQ fields available for investigation would be nice. My worry was that reject-publish on a DL queue would cause some weird requeueing loop. Instead, it's just ignored.

ansd commented 1 year ago

At-least-once dead lettering will respect reject-publish of the dead letter target quorum queue. However, you'll need to take care to not cause message pileup in the source quorum queue. (In your case the messages already pile up in the source quorum queue.)

In an effort to free up resources I was trying to limit the size of DL queues.

Instead of dead lettering to a quorum queue, could you dead letter to a Stream (possible since https://github.com/rabbitmq/rabbitmq-server/pull/7846)?

leadenmoth commented 1 year ago

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish. In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Re: streams, I won't be able to justify the extra development work for our almost-non-existent dead lettering use case, but I'll look into it when we begin overhauling this part of the system. Basically, I'm the only one fighting to keep any dead lettering in the current implementation :P

michaelklishin commented 1 year ago

Dead lettering does not really follow most of the code paths related to publishing, so this can be one of the examples where it's a mere oversight.

ansd commented 1 year ago

I added your feature request to https://github.com/rabbitmq/rabbitmq-server/issues/8261#issuecomment-1580886519

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish.

You're right, we should document that specific limitation.

In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Yes, good point 🙂 Although this still applies to dead letter target classic queues.

I'm just trying to help you with an immediate workaround. You could

Switch your target dead letter quorum queue overflow behaviour to drop-head, or
Use a classic queue as target dead letter queue with overflow behaviour either drop-head or reject-publish, or
Use a Stream as target dead letter queue (configuring x-max-age or x-max-length-bytes).

leadenmoth commented 1 year ago

Thanks @ansd - I was going to go with drop-head and leave it at that for the current increment. For our use case it's a nice-to-have, but since it seemed like a bug or a gap in docs, I decided to report it. Depending on how long it takes until we start redesigning the system, I might switch DL to classic and try again - those queues, the way we use them, don't need to be quorum, but it's also a sizeable piece of work to just recreate them as classic due to how our code is layered and per-environment change management happens, and 3.11 upgrade reduced the pain points so that it's harder to justify further changes :D

ansd commented 1 year ago

Many thanks for reporting this issue @leadenmoth! To be honest, until you opened this issue, I wasn't aware that quorum queues behave like that.

rabbitmq / rabbitmq-server