noxdafox / rabbitmq-message-deduplication

RabbitMQ Plugin for filtering message duplicates
Mozilla Public License 2.0
281 stars 34 forks source link

Attention to decrease performance #60

Open MasiumDev opened 4 years ago

MasiumDev commented 4 years ago

I enable this plugin and it decreases the performance of other queues. Do you test it on high publish about 20,000/sec?

hmaiga commented 3 years ago

Hi @MasiumDev, Do you have some feedbacks after few months of usage i believe. I would like to use it but i'm a bit worried about the performance of how messages are kept in cache.

VSilantyev commented 2 years ago

I think performance is O(n) from number of unique items in the queue.

Seems like there is linked list for holding cache entities: https://github.com/noxdafox/rabbitmq-message-deduplication/blob/b2227b72f7eb55d23b5ce7fb42ef0b9c7cef52dd/lib/rabbitmq_message_deduplication/cache.ex#L189

List.keyfind: https://hexdocs.pm/elixir/1.13/List.html

noxdafox commented 2 years ago

Mnesia read always returns a list entries as a result. Therefore, we need to find the entry within the list. As the list is either containing no entries (cache miss) or one entry (cache hit), the big-O cost is O(1)(Mnesia table lookup) + O(1)(one-size list lookup).

The reason this issue is still open is to give an example to the community on how not to write a ticket on an open source project. Commenting someone's work as "your code is slow, do you even test it at scale?" is not contributing in any way to improving things.

The OP is not providing any meaningful information such as:

A minimum knowledge in the domain would suggest that implementing a caching layer over a distributed service will indeed incur in computing costs. As multiple publishers might be pushing the same message at the same time onto different RabbitMQ nodes, we need to guarantee a replicated and transactional storage for the deduplication headers. This obviously does not come for free. Hence, a performance impact is to be expected.

The real question is whether the impact caused by this plugins is significantly higher than alternative solutions implemented in similar ways. If there would be a benchmark showing that, for example, a set of consumers relying on a Redis cluster for deduplication would show significantly higher consumption throughput then, and only then, there would be a valid concern to be raised.

VSilantyev commented 2 years ago

@noxdafox Thank you for such detailed reply. Indeed I was wrong about O(n), missed a part about Mnesia.read. My bad, I do not know erlang. It would be great to add your explanation to readme. Should I create a PR?

noxdafox commented 2 years ago

@VSilantyev don't worry I was not addressing you in the answer but rather highlighting the issue with the overall ticket.

I am not sure these architectural details are much of help for the reader of a README. In there, usually a reader expects to find:

The "how this thing works" and "when you should not use this thing" are usually more on the technical documentation/wiki side. I have few ideas on providing usage examples and more in depth wikis but my time is currently a bit limited.