noxdafox / rabbitmq-message-deduplication

RabbitMQ Plugin for filtering message duplicates
Mozilla Public License 2.0
277 stars 34 forks source link

[Feature] Mode for de-duplication that replaces existing messages #57

Closed e2-robert closed 4 years ago

e2-robert commented 4 years ago

Is

A queue declared with the x-message-deduplication parameter enabled will filter message duplicates before they are published within. Each message containing the x-deduplication-header header will not be enqueued if another message with the same header is already present within the queue.

Wish

A queue declared with the x-message-deduplication parameter enabled will filter message duplicates before they are published within. Each message containing the x-deduplication-header header will replace any other message with the same header within the queue.

This means, if an update of the same message is enqueued, it will:

  1. skip processing the previously scheduled message, as it's value is outdated anyway
  2. take over the spot of the previous message to avoid being never processed (if message updates are enqueued frequently and the queue is very long)
CharlieReitzel commented 4 years ago

+1 For this feature. It's a pretty common use case: e.g. product price updates and various "something was updated" notification cases. Supported by Kafka OOTB, btw.

Sometimes the "last one wins" policy is called message conflation. It might make sense to use a header name like x-conflation-key to contain the unique message identifier used to search and replace for any previous, but still unprocessed messages on a queue.

For pub/sub, this gets complicated because delivery tracking is required per consumer. So a bigger ask, I think, in that context.

noxdafox commented 4 years ago

Hello,

my apologies for the late reply. I am afraid the feature you are asking for is unfeasible in the following plugin.

RabbitMQ implements its queue as FIFO stacks with sequential access. Therefore, each replacement of an existing message would have a linear cost which would be unsustainable. To make matters even more complicated, RabbitMQ stores the messages on disk by packing them together. It can do so based on the assumption the messages are accessed sequentially. In our case nevertheless, we would need to unpack an entire batch of messages each time we would need to change a single message payload.

Moreover, this plugin relies on specific callbacks to carry on its operation. When a new message is about to be published to the queue, we simply check the x-deduplication-header against a cache and we notify the queue process if it needs to be discarded as duplicate. The de-duplication plugin has no access to the lower level queue itself.

Supporting your use case would require a re-architecting of RabbitMQ queue themselves. It's not something which can be tackled as a feature request within this plugin. What I would recommend you, is to reach out the community and discuss whether your use case would receive some support. If so, I'd be glad to contribute.

Otherwise, I would suggest you to adopt other solutions such as Redis Queues or Kafka.

RabbitMQ internals documentation

CharlieReitzel commented 4 years ago

@noxdafox Hi Matteo,

Thanks much for your thoughtful response. It has helped my to better understand how RabbitMQ works under the hood! Also, I can set my expectations appropriately wrt this type of feature, either from you - or anyone else - in the RabbitMQ context. Again, very helpful.

cheers, Charlie