onebeyond / rascal

A config driven wrapper for amqp.node supporting multi-host connections, automatic error recovery, redelivery flood protection, transparent encryption / decryption and channel pooling.
MIT License
451 stars 69 forks source link

FEATURE: Improve the republish strategy with immediateNack and dead-letter queue #237

Closed apehead closed 5 months ago

apehead commented 6 months ago

Rascal's republish strategy with immediate nack adds useful error details to messages sent to a dead letter queue. Yet, removing the immediateNack header for resending the messages back to the original queue could be a challenge in some scenarios.

Detailed Description

We configured Rascal to utilize the republish strategy with immediate nack to ensure that error details are included in messages sent to the dead letter queue. In our specific scenario, we don't intend to republish the message, but we need the error details.

However, as mentioned in the documentation, before sending a message from a dead letter queue back to the original queue, it's necessary to remove the immediateNack header.

In our use case, we require manual handling of messages within the failure queue to make decisions accordingly. Therefore, if there's a need to resend them to the original queue, removing this header is a challenge from the RabbitMQ web UI.

Is it feasible to add an option to remove the immediateNack header before nacking and routing the message to the dead letter queue? Perhaps there's a solution we've overlooked.

I want to emphasize that in this scenario, we intend to manually move the messages from the RabbitMQ web UI.

Context

By enhancing the republish strategy and/or the error details logic for the scenario described above, we could manage message flow and error handling within our system as expected, matching the implementations we already have in other languages.

Moreover, I believe this enhancement could benefit other users facing similar scenarios as it simplifies moving messages between queues.

Possible Implementation

I don't have a specific solution, but after taking a look at this piece of code in the SubscriberError combined with the check for the immediateNack in the Subscription maybe it could be modified to, based on a new configuration option that could be propagated through headers if necessary, remove the immediateNack header from the message properties once the maximum attempts have been reached and before finally nacking it.

Other ideas floating in my mind include concepts not currently available in Rascal, like user middlewares/afterwares. These could let us dive into the raw message and tweak it as needed.

I'm open to collaborating on potential solutions.

Your Environment

cressie176 commented 6 months ago

Hi @apehead,

Thanks for the feature request. Unfortunately when you nack a message, you can't modify it. Hence it isn't possible to remove the immediateNack header. I'll leave the issue open and have a think about if there's an alternative approach that might work

cressie176 commented 6 months ago

One option might be for Rascal to optionally ignore the immediateNack header based on a configurable timeout. This would allow the message to be republished until the attempts expired, then immediately nacked. Providing it sits on the DLQ for longer than the timeout, the message could be moved back to the work queue without removing the immediateNack headers. It's not perfect, but could still be useful

cressie176 commented 6 months ago

A similar idea to explore is whether the x-death header could be used. E.g. immediateNack only works when the current number of x-death records is the same as when the message was republished with the immediateNack header.

apehead commented 6 months ago

@cressie176 thanks a ton for jumping in on this so quickly and offering up these ideas. Truly appreciated!

I've thought about the options you mentioned and they seem like good ideas. I'm unsure which one offers greater control over message handling with less complexity for Rascal, so I trust your judgment.

I'm all in if you need a hand with testing or contributions.

cressie176 commented 6 months ago

Hi @apehead,

I took a look at this over the weekend. I think the x-death header is the way to go...

The x-death header is updated automatically when a message is nacked. It holds an array of records similar to the following...

[
  {
    count: 3,
    exchange: '',
    queue: 'work_queue',
    reason: 'rejected',
    'routing-keys': [ 'work_queue' ],
    time: { '!': 'timestamp', value: 1715451949 }
  },
  {
    count: 1,
    exchange: '',
    queue: 'other_queue',
    reason: 'rejected',
    'routing-keys': [ 'other_queue' ],
    time: { '!': 'timestamp', value: 1715452131 }
  }  
]

The records in the array are keyed using the queue and reason fields, so repeated nacks for the same queue and reason will increment the count, rather than inserting a new record. When rascal rejects with an immediate nack, the reason is always "rejected".

Implementing this feature can be achieved by

  1. copying the relevant count into a new header rascal.recovery.${queue}.rejected.count where rascal sets the immediateNack header (or defaulting it to zero) See here.
  2. ignoring and deleting the immediateNack header when the message's relevant x-death count header is greater than the recovery count stored in (1). See here

I think implementation will be straightforward, but the tests slightly cumbersome. I would expect coverage for

  1. No x-death header
  2. x-death header less than recovery count
  3. x-death header count equal to recovery count
  4. x-death header greater than recovery count
  5. ignores irrelevant x-death header

Added for both the callback and promise tests

I can't see a downside to this feature so I think it is worth changing the behaviour of the immediateNack option, rather than adding a new option. However the documentation will also need to be updated.

If you want to have a go, please do, otherwise I can probably get to it in the week.

cressie176 commented 5 months ago

Available in Rascal@v20.1.0

apehead commented 5 months ago

Thanks for the quick fix @cressie176! I really appreciate your dedication. Sorry I couldn't contribute this time, but if there's another chance in the future, I'd love to help.