Open leadenmoth opened 1 year ago
At-least-once dead lettering will respect reject-publish of the dead letter target quorum queue. However, you'll need to take care to not cause message pileup in the source quorum queue. (In your case the messages already pile up in the source quorum queue.)
In an effort to free up resources I was trying to limit the size of DL queues.
Instead of dead lettering to a quorum queue, could you dead letter to a Stream (possible since https://github.com/rabbitmq/rabbitmq-server/pull/7846)?
I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish. In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message
Re: streams, I won't be able to justify the extra development work for our almost-non-existent dead lettering use case, but I'll look into it when we begin overhauling this part of the system. Basically, I'm the only one fighting to keep any dead lettering in the current implementation :P
Dead lettering does not really follow most of the code paths related to publishing, so this can be one of the examples where it's a mere oversight.
I added your feature request to https://github.com/rabbitmq/rabbitmq-server/issues/8261#issuecomment-1580886519
I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish.
You're right, we should document that specific limitation.
In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message
Yes, good point 🙂 Although this still applies to dead letter target classic queues.
I'm just trying to help you with an immediate workaround. You could
drop-head
, ordrop-head
or reject-publish
, orx-max-age
or x-max-length-bytes
).Thanks @ansd - I was going to go with drop-head and leave it at that for the current increment. For our use case it's a nice-to-have, but since it seemed like a bug or a gap in docs, I decided to report it. Depending on how long it takes until we start redesigning the system, I might switch DL to classic and try again - those queues, the way we use them, don't need to be quorum, but it's also a sizeable piece of work to just recreate them as classic due to how our code is layered and per-environment change management happens, and 3.11 upgrade reduced the pain points so that it's harder to justify further changes :D
Many thanks for reporting this issue @leadenmoth! To be honest, until you opened this issue, I wasn't aware that quorum queues behave like that.
Describe the bug
RabbitMQ 3.11.11, Erlang 25.3, deployed in AKS via cluster-operator 2.2.0
Setting
max-length: <int>
andoverflow: reject-publish
policy on a quorum dead letter queue leads to queue continuously filling beyond max-length (I stopped the test at 25000 messages with max-length:10). According to the docs,My understanding is that pushing a dead message from main queue to dead letter exchange and then to bound dead letter queue is done entirely on the server side?.. If so, my best guess is that whatever does that publishing (some internal publisher?..) ignores the reject-publish notification. Setting
drop-head
(well, unsettingreject-publish
and letting it default todrop-head
) instead works perfectly.Reproduction steps
quorum
, set queue-exchange bindings, setx-dead-letter-exchange
,x-dead-letter-routing-key
andx-delivery-limit
on normal queuemax-length
andreject-publish
policy on the DL queuemax-length
to normal queuex-delivery-limit
is reached and the messages start moving to DL queuemax-length
...Expected behavior
Messages to-be-dead-lettered above the max-length limit should be discarded when
reject-publish
is setAdditional context
Context: we have a legacy system prone to causing message pile-ups and going above the high RAM watermark for good (less so after 3.8->3.11 upgrade, due to all the improvements to quorum behavior). Most queues are quorum and have DL counterparts (also quorum) with a single dead letter exchange and routing-key -> binding routing. Message pileups eventually end up in corresponding dead letter queues, still occupying RAM (or HDD, since the 3.11 upgrade). In an effort to free up resources I was trying to limit the size of DL queues. Since there's no process for dealing with DL messages at all (again, legacy system, someone decided having DL would be good, but never got around to using it), the knee-jerk reaction would be to just kill the DL queues. I thought I'd instead keep them for human review with just a few messages, but, crucially, the earliest messages that failed. Sure, failure could be logged elsewhere (and it is), but I thought having all the RabbitMQ fields available for investigation would be nice. My worry was that reject-publish on a DL queue would cause some weird requeueing loop. Instead, it's just ignored.