rabbitmq / rabbitmq-peer-discovery-aws

AWS-based peer discovery backend for RabbitMQ 3.7.0+
Other
24 stars 11 forks source link

Failed to join cluster EC2 api 503 error #30

Closed dimmy-timmy closed 5 years ago

dimmy-timmy commented 5 years ago

Hi, i setup cluster on EC2 Autoscaling group sometimes when i scale cluster up i have got following error:

2019-09-25 10:30:10.443 [info] <0.316.0> Peer discovery backend rabbit_peer_discovery_aws supports registration.
2019-09-25 10:30:10.443 [info] <0.316.0> Will wait for 17234 milliseconds before proceeding with registration...
2019-09-25 10:30:27.834 [error] <0.316.0> Error fetching node list via EC2 API, request path: /?Action=DescribeInstances&InstanceId.3=i-03a4d5738d9e0318f&InstanceId.4=i-02f5e1f6cd32cba6f&InstanceId.5=i-012c8a00ced2729e3&Version=2015-10-01, error: "Service Unavailable"
2019-09-25 10:30:27.834 [error] <0.316.0> Cannot discover any nodes: DescribeInstances API call failed.
2019-09-25 10:30:27.834 [info] <0.316.0> All discovered existing cluster peers:
2019-09-25 10:30:27.834 [info] <0.316.0> Discovered no peer nodes to cluster with
10:30:27.838 [info] Application mnesia exited: :stopped
2019-09-25 10:30:27.839 [info] <0.43.0> Application mnesia exited with reason: stopped
2019-09-25 10:30:27.982 [info] <0.316.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-25 10:30:28.022 [info] <0.316.0> Feature flag `drop_unroutable_metric`: supported, attempt to enable...

With such error node will setup as new node.

Is it possible to fail rabbitm process on such error or add retry to AWS service call? We are in the cloud so retry is a best practice, right?

Rabbitmq 3.8.0-rc.1 Erlang 22.0.7

dimmy-timmy commented 5 years ago

Link to rabbitmq forum topic https://groups.google.com/forum/#!topic/rabbitmq-users/7xcssx5pXqU

michaelklishin commented 5 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team).

We get at least a dozen of questions through various venues every single day, often light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because GitHub is a tool our team uses heavily nearly every day, the signal/noise ratio of issues is something we care about a lot.

Please post this to rabbitmq-users.

Thank you.

michaelklishin commented 5 years ago
2019-09-25 10:30:27.834 [error] <0.316.0> Error fetching node list via EC2 API, request path: /?Action=DescribeInstances&InstanceId.3=i-03a4d5738d9e0318f&InstanceId.4=i-02f5e1f6cd32cba6f&InstanceId.5=i-012c8a00ced2729e3&Version=2015-10-01, error: "Service Unavailable"
2019-09-25 10:30:27.834 [error] <0.316.0> Cannot discover any nodes: DescribeInstances API call failed.

are the key lines.

michaelklishin commented 5 years ago

I misinterpreted what this issue is about. We have discussed a specific improvement on the list. Will file a new one.