rabbitmq / rabbitmq-peer-discovery-aws

AWS-based peer discovery backend for RabbitMQ 3.7.0+
Other
24 stars 11 forks source link

node failed to join the cluster #11

Closed youssefNM closed 6 years ago

youssefNM commented 6 years ago

Hi guys,

i'm having issues making Rabbitmq work as a cluster of nodes, my cluster has 2 nodes, but it seems the second node failed to join the cluster. the part where i'm seeing the errors in the logs is here :

2018-01-06 21:59:42.293 [info] <0.189.0> Configured peer discovery backend: rabbit_peer_discovery_aws
2018-01-06 21:59:42.293 [info] <0.189.0> Will try to lock with peer discovery backend rabbit_peer_discovery_aws
2018-01-06 21:59:42.294 [info] <0.189.0> Peer discovery backend rabbit_peer_discovery_aws does not support registration, skipping randomized startup delay.
2018-01-06 21:59:42.294 [info] <0.33.0> Application rabbitmq_aws started on node 'rabbit@ip-10-141-111-159'
2018-01-06 21:59:42.540 [info] <0.189.0> All discovered existing cluster peers: rabbit@10.141.111.159, rabbit@10.141.110.213
2018-01-06 21:59:42.540 [info] <0.189.0> Peer nodes we can cluster with: rabbit@10.141.111.159, rabbit@10.141.110.213
2018-01-06 21:59:42.540 [error] <0.218.0> ** System NOT running to use fully qualified hostnames **
** Hostname 10.141.111.159 is illegal **
2018-01-06 21:59:42.540 [warning] <0.189.0> Could not auto-cluster with node rabbit@10.141.111.159: {badrpc,nodedown}
2018-01-06 21:59:42.541 [error] <0.219.0> ** System NOT running to use fully qualified hostnames **
** Hostname 10.141.110.213 is illegal **
2018-01-06 21:59:42.541 [warning] <0.189.0> Could not auto-cluster with node rabbit@10.141.110.213: {badrpc,nodedown}
2018-01-06 21:59:42.541 [warning] <0.189.0> Could not successfully contact any node of: rabbit@10.141.111.159,rabbit@10.141.110.213 (as in Erlang distribution). Starting as a blank standalone node...

i'm running rabbitmq as a docker container, and this is how i'm running the container :

docker run -d --name rabbitmq --hostname $HOSTNAME -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 25672:25672 -e RABBITMQ_ERLANG_COOKIE='${secret_cookie}' -e RABBITMQ_USE_LONGNAME=false -v /root/data:/var/lib/rabbitmq -v /root/conf/:/etc/rabbitmq -v /root/bin:/tmp/bin rabbitmq:3-management

and here is the content of my rabbitmq.conf :

       cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws
        cluster_formation.aws.region = ${region}
        cluster_formation.aws.use_autoscaling_group = true

it seems an issue with the EC2 instance hostname, but I couldn't figure it out, I already tried to launch Rabbitmq container with -e RABBITMQ_USE_LONGNAME=true and also tried what was suggested in similar cases but that didn't help.

any help with this is really appreciated 🙏

michaelklishin commented 6 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes two things:

  1. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team)
  2. We have a certain amount of information to work with

We get at least a dozen of questions through various venues every single day, often quite light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because of that questions, investigations, root cause analysis, discussions of potential features are all considered to be mailing list material by our team. Please post this to rabbitmq-users.

Getting all the details necessary to reproduce an issue, make a conclusion or even form a hypothesis about what's happening can take a fair amount of time. Our team is multiple orders of magnitude smaller than the RabbitMQ community. Please help others help you by providing a way to reproduce the behavior you're observing, or at least sharing as much relevant information as possible on the list:

Feel free to edit out hostnames and other potentially sensitive information.

When/if we have enough details and evidence we'd be happy to file a new issue.

Thank you.

michaelklishin commented 6 years ago
2018-01-06 21:59:42.541 [warning] <0.189.0> Could not auto-cluster with node rabbit@10.141.110.213: {badrpc,nodedown}
2018-01-06 21:59:42.541 [warning] <0.189.0> Could not successfully contact any node of: rabbit@10.141.111.159,rabbit@10.141.110.213 (as in Erlang distribution). Starting as a blank standalone node

are the lines you are looking for. See the list of ports that must be open and how nodes authenticate to each other.