rabbitmq / rabbitmq-federation

RabbitMQ Federation plugin
https://www.rabbitmq.com/
Other
40 stars 21 forks source link

Link on federated exchange is missing #55

Closed cvuillemez closed 7 years ago

cvuillemez commented 7 years ago

Hi, we have an issue on production machines. Downstream exchange is federated to a local upstream broker with these settings :

Federation Upstream: upstream-pfs_engen-service_tram in virtual host pfs_engen
URI     amqp://user1:XXXXXX@localhost/service_tram
Trust User-ID : yes
Exchange    tram.publish.content

Policy is created for the downstream exchange:

Policy: pfs_engen-mirror-and-federate-to-service_tram in virtual host pfs_engen
Overview
Pattern     modservices.tram.publish.document
Apply to    exchanges
Definition  
ha-mode:    exactly
ha-params:  2
ha-sync-mode:   automatic
federation-upstream:    upstream-pfs_engen-service_tram
Priority    0

And well applied to this downstream exchange "modservices.tram.publish.document" :

Type    topic
durable:    true
Policy  pfs_engen-mirror-and-federate-to-service_tram

But since 2 days ago, federation link is missing so messages are blocked in upstream exchange/queue:

# rabbitmqctl eval 'rabbit_federation_status:status().' |grep -w exchange
#

Please can you help ? Is it a bug ? Which tool/cmd can help on investigation ? Thanks !

michaelklishin commented 7 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes we have a certain amount of information to work with. Questions, investigations, root cause analysis, discussions for potential features are all considered to be mailing list material by our team. When/if we have enough details and evidence we'd be happy to file a new issue.

Please post this to rabbitmq-users. Thank you.

michaelklishin commented 7 years ago

Something similar was discussed on rabbitmq-users in the last few months.

michaelklishin commented 7 years ago

Logs can help with investigation. This belongs to rabbitmq-users.

The only thing that stands out from your definition is that you have ha-mode and ha-params in a policy that only applies to exchanges.

michaelklishin commented 7 years ago

Here's the longest of 2 or so threads about this in the last few months. There is no evidence that what you are seeing is related or not.

cvuillemez commented 7 years ago

yes that's right:

http://rabbitmq.1065348.n5.nabble.com/Messages-not-getting-moved-to-the-Upstream-Federation-only-empty-queue-is-getting-created-td28726.html

cvuillemez commented 7 years ago

After deleting/creating the same policy everythng works fine.

michaelklishin commented 7 years ago

Well, yes, of course. Re-creating a policy re-creates the links. The question in the thread above is under what scenario the links do not recover on their own. There are a few cases where they voluntarily stop (should be covered on the list) and we've improved logging for those cases in 3.6.7 as well as added federation management API extensions that let's one monitor and restart links in "down" state.

It's an open ended question what else can be improved.

cvuillemez commented 7 years ago

OK so 3.6.7 seems interesting :) In my case issue occur on network outage - so the same scenario covered in the thread - :

INFO REPORT==== 14-Mar-2017::22:46:15 ===
node 'rabbit@node02-vl3464.prod.msgq.b0.p.fti.net' down: net_tick_timeout

=ERROR REPORT==== 14-Mar-2017::22:46:15 ===
Partial partition detected:
 * We saw DOWN from rabbit@node02-vl3464.prod.msgq.b0.p.fti.net
 * We can still see rabbit@node03-vl3464.prod.msgq.b0.p.fti.net which can see rabbit@node02-vl3464.prod.msgq.b0.p.fti.net
We will therefore intentionally disconnect from rabbit@node03-vl3464.prod.msgq.b0.p.fti.net

=ERROR REPORT==== 14-Mar-2017::22:46:15 ===
Mnesia('rabbit@node01-vl3464.prod.msgq.b0.p.fti.net'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node02-vl3464.prod.msgq.b0.p.fti.net'}