Closed luos closed 2 years ago
I only see two options:
Binding recovery will happen on node boot regardless of what we do. Besides bindings themselves, exchange state is a really rare edge case that's not really handled in RabbitMQ.
Ideas and PRs would be appreciated, our team doesn't have the capacity to redesign this plugin any time soon.
I think it would be fine to not apply the binding if we already have weight number of entries - though it makes it harder that you can bind the queue multiple times.
Maybe as a quick fix just resolving the counter(?) which goes wrong with the buckets would be fine so at least there is a queue where the message goes even if it's not consistent.
We can ignore the fact that you can bind a queue multiple times as that's clearly something the binding recovery process and such a stateful exchange type cannot handle (we don't know what other bindings are going to be recovered down the line).
For this exchange type, multiple bindings likely is a result of a mistake. We can assume that if you have enough ring entries ("weight") for the recovered binding, we can do nothing.
Very interested in a resolution to this issue! I'm seeing the same issue, where node restarts cause the exchange to become unusable, though I'm not seeing the binding count increase.
Turns out I didn't really resolve this in https://github.com/rabbitmq/rabbitmq-server/pull/3594... When bindings are copied over when a new node comes up, they aren't sorted anymore so remove_binding
doesn't properly adjust
An example after node removal:
I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
{chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
#{0 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-pzn4k">>},
1 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-kt7v5">>},
2 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-kt7v5">>},
3 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-pzn4k">>},
4 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-kt7v5">>},
5 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-pzn4k">>},
6 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-jkw6b">>}},
7},
Which means the ring breaks when one of those bindings is removed
I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
{chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
#{3 =>
{resource,<<"/">>,queue,
<<"order-group-action-panda-668d997c5-jkw6b">>}},
4},
This also can happen by doing the following
I imagine this can be fixed with some sorting in remove_binding
but as per
For this exchange type, multiple bindings likely is a result of a mistake
I wonder if it would be better to just reject new bindings with the same name? Any suggestion @michaelklishin ?
I think that maybe rebinding
should just ensure that there are as many buckets as the new binding and drop the old binding. What do you think?
I'd say rejecting a binding may be unexpected for some applications, though most probably for 99% of users it does not matter (and now it's broken anyway).
Hi all! I've experienced same issue last week ("Bucket not found" in log, no routing out of exchange) and wondering around for fix. #3594 is merged to 3.8.24 (we have 3.8.3 versioned cluster at the moment) so I may update to get partly fix.
What about adding new nodes which is the main way to scale cluster? Is it unsafe for 'x-consistent-hash' exchanges ?
Hi all! Is there have any updates?
Heyang
Regards
Hi,
From my side, I'd prefer if the plugin rejects rebinding, that would simplify it, but it definitely needs some more thinking and implementation. :) I think node restarts then deleting the binding still break it. I have not tested but possibility it's OK when you do not delete bindings, keeping in mind, restarting will probably reshuffle the exchange.
Hi,
I made some tests and here are my findings:
I tested this on 3.9.13.
Consistent hash exchange is the only option as far as I am aware in RabbitMQ to be able to maintain the order of dependent messages while scaling up the number of consumers. This plugin was also added in Amazon MQ but apparently with this particular bug it ends up rendering this plugin unusable. I would say it is quite a critical bug for anyone thinking on using this particular plugin, since a node restart within a cluster is quite a standard operation. Is there any way this can be prioritized in the roadmap @michaelklishin ?
@FalconerTC is there any chance on fixing this issue? Or are there any known workarounds for this aside from full rebinding? We're stuck with this issue in our application also.
@nunoguerreirorosa this is open source software, so you if want something "prioritized", dive in and provide a set of steps to reasonably reliably reproduce, then feel free to look into the root cause. Pressing others to "prioritize" something in a piece of software you get for free is not how open source software evolves.
Consistent hashing exchange, unlike every other "built-in" (or tier 1) exchange type, is stateful. Booting nodes would recover ring state while applications can already begin modifying the same state by binding or unbinding. Besides delaying client connection listener startup until after all plugins are enabled and some time passes, there is no way to avoid this that would not result in refused client operations.
@luos ring state recovery does not involve creating any bindings. The plugin uses a list of durable bindings to recompute the (transient) state and then adapts it as bindings are added or removed.
@stgolem we don't understand what the issue is or how to reproduce it. You are welcome to spend some time on that.
I now see some investigations above 👍
Making binding addition idempotent sounds like the right thing to do (removal, of course, already is).
This is a behavior change so we either have to try to squeeze it into 3.10 or anger the SemVer lovers and ship it in a patch release (even though it would count as a bug fix to many).
Other team members suggest that we cannot delay 3.10 over this, so it would have to go into a patch release at some point.
I do not think that's a problem. It is a bug and now it is quite broken, I'd say it is not recommended to use in a clustered environment, so if it gets fixed at some point, it will be a bug fix.
Thanks for your time. We have exactly same issue as others with this plugin in cluster environment and node restarts. We will gently wait for this issue to resolve in some next release. As a user i classify this as a bug.
Greetings! I have a case how to reproduce "bucket not found" error.
RMQ runs as a cluster of three nodes. One exchange and four queues has been created. 1 & 3 queues are located on node 2, 2 & 4 - on node 1. Four bindings with routing key "1" has been created also.
Check the ring status:
$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test3'
Ring index: 3, queue: 'test4'
Send a message with routing key 101 in the exchange. The message has been routed to queue 'test1'.
Next, stop node 1 with rabbitmqctl stop_app
command and check the ring status:
rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'
Next, start node 1 with rabbitmqctl start_app
command and check the ring status:
rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'
Ring index: 2, queue: 'test1'
Ring index: 3, queue: 'test3'
Ring index: 4, queue: 'test2'
Ring index: 5, queue: 'test4'
Next, stop node 2 and check the ring status:
$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'
Next, start node 2 and check the ring status:
$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'
Ring index: 3, queue: 'test2'
Ring index: 4, queue: 'test4'
Ring index: 5, queue: 'test1'
Ring index: 6, queue: 'test3'
Finally, send a message with routing key 101 in the exchange.
Result: the message published, but not routed. Logs: 2022-04-20 13:20:04.078869+00:00 [warn] <0.24750.1> Bucket 0 not found
cc: @luos @michaelklishin
Is this planned to be fixed in both 3.9.X and 3.10.X? Are there any recommendations on a quicker way to resolve this after container recreation other than simply removing and recreating the exchange?
I have been looking at how we can make binding addition and deletion idempotent for this exchange type. I see only one simple way: to keep track of what (destination, type, routing key)
have been added and if they are known, make the function do nothing.
This has one limitation. Code that repeatedly binds the same exchange and queue with different routing keys/weights is no longer going to work:
# in pseudocode
bind(chx, q1, weight: 1)
bind(chx, q1, weight: 3)
will lead to a single binding with "weight = 1". I find this acceptable but there may be some code that's going to break as a result, or at least the routing distribution will be affected. Quite frankly, I don't think the weights idea is worth the complexity anyway, and this may be our first step towards getting rid of it entirely for 3.11 or 4.0.
Trying to make hash ring updates idempotent is going to be too complex for me to agree to.
If it is the case that the effect of weights is indeterminate in the face of node restarts, then adopting a first wins approach does not seem unreasonable to me.
The hash ring state record changes would require a schema database migration, which seems like an overkill here. Will try to make the function idempotent using the bindings/routes tables instead which are very stable.
First attempt at relying on binding data to make hash ring state management idempotent failed. Because RabbitMQ will create bindings anyway, it would only create confusion.
A longer term solution would be to make this the first plugin that adopts Khepri, our Raft-based next generation schema data store.
In the short term I'll look into using the state of the ring itself and/or the good old distributed locking module in Erlang.
Thanks @SteamUpdate for the excellent reproduction steps in https://github.com/rabbitmq/rabbitmq-server/issues/3386#issuecomment-1103929292.
https://github.com/rabbitmq/rabbitmq-server/pull/5121 makes adding bindings idempotent and will therefore fix this issue.
As @michaelklishin already mentioned this comes with a slight breaking change for applications that previously relied upon adding duplicate bindings (i.e. bindings with the same source exchange and same destination queue but possibly different routing key).
Hi,
When a single node is restarted out of a cluster the consistent hash exchange will add existing bindings again into the ring.
Reproduction:
Output:
After this cleaning up the bindings and recreating them breaks the exchange: