rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.11k stars 3.9k forks source link

`duplicate_child_name` with non-unique strings in `federation-upstream-set`. #2547

Open jemc opened 9 years ago

jemc commented 9 years ago

I'm trying to create an environment for testing (an application using) the federation plugin in docker containers. Specifically, I am working with exchange federation in a setup that will let me publish some jobs on a local direct exchange (with bindings to the local jobs queues), a 'foreign' exchange that federates messages to all other brokers' local exchanges, and a global exchange that federates messages to all local exchanges.

When using federation-upstream-set: all, everything works as desired and I get proper working federation links between exchanges. However, I want to avoid using the all set in case I want to add other distinct federation-upstream-sets later. When I use the federation-upstream-set: federated-jobs (my specific upstream set that I configured), I get an erlang process crash report on each broker that looks like the following (and none of the federation links start properly, so I have no working federation):

rabbitmq2 | =CRASH REPORT==== 25-Jun-2015::16:58:23 ===
rabbitmq2 |   crasher:
rabbitmq2 |     initial call: supervisor2:init/1
rabbitmq2 |     pid: <0.528.0>
rabbitmq2 |     registered_name: []
rabbitmq2 |     exception exit: {start_spec,
rabbitmq2 |                         {duplicate_child_name,
rabbitmq2 |                             {upstream,
rabbitmq2 |                                 [<<"amqp://jobsuser:jobspass@172.17.0.106/jobs">>],
rabbitmq2 |                                 <<"global">>,<<"local">>,1000,1,1,none,none,
rabbitmq2 |                                 false,'on-confirm',none,
rabbitmq2 |                                 <<"rabbitmq0-global-jobs">>}}}
rabbitmq2 |       in function  gen_server:init_it/6 (gen_server.erl, line 322)
rabbitmq2 |     ancestors: [<0.278.0>,rabbit_federation_exchange_link_sup_sup,
rabbitmq2 |                   rabbit_federation_sup,rabbit_sup,<0.84.0>]
rabbitmq2 |     messages: []
rabbitmq2 |     links: [<0.278.0>]
rabbitmq2 |     dictionary: []
rabbitmq2 |     trap_exit: true
rabbitmq2 |     status: running
rabbitmq2 |     heap_size: 987
rabbitmq2 |     stack_size: 27
rabbitmq2 |     reductions: 1214
rabbitmq2 |   neighbours:

Please see this gist for my two repro scripts and the full interleaved log output from the three brokers: https://gist.github.com/jemc/6005b6d6fcfee9815240

To reproduce, run the two scripts one after the other in separate terminals (one creates and runs the broker docker containers, the other connects to and configures the running brokers). The third file is the interleaved log output as seen from the terminal running the three brokers.

If there's anything else I can do or information I can provide to help troubleshoot this issue, please let me know.

michaelklishin commented 9 years ago

Thank you. This means there's a clash of identifiers used by exchange federation internally. I'll take a closer look at what exactly is used to calculate the identifier (and if we need to force them being unique).

jemc commented 9 years ago

Okay, it looks like this is mostly a bug on our end. The script had a copy-paste-typo causing duplicate strings to be set in the federation-upstream-set set_parameter call. The federation plugin assumes that the strings are unique because with proper usage they should be, but does not verify this, causing supervised children with the same "name" tuple to be spawned, and all of them fail.

Perhaps rabbit_federation_upstream:from_set_contents could be amended to filter out non-unique names as it collects? Or perhaps it could be made to fail faster with a clearer error?

Either way, sorry to waste your time with a bug that was mostly on our end. Feel free to close this ticket if you don't think the code base should be changed to continue gracefully or fail faster.

michaelklishin commented 9 years ago

@jemc failing early sounds like the right thing to do.