rabbitmq / rabbitmq-peer-discovery-aws

AWS-based peer discovery backend for RabbitMQ 3.7.0+
Other
24 stars 11 forks source link

return an error when an AWS API fails so that peer discovery will be retried #39

Closed stefanmoser closed 4 years ago

stefanmoser commented 4 years ago

Proposed Changes

Discussion thread: https://groups.google.com/g/rabbitmq-users/c/FN1hHZS7BPs

Currently, when an AWS API call fails, the plugin swallows the error and acts as if there are no peers to cluster with. This results in nodes erroneously creating multiple clusters. This behaviour is contrary to the current documentation that states that "with the AWS mechanism, EC2 API requests are retried".

Returning an error here means that rabbitmq-server will retry with the configured values and brings the behaviour inline with what the documentation says should happen.

Types of Changes

What types of changes does your code introduce to this project? Put an x in the boxes that apply

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask on the mailing list. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Further Comments

This seems like the smallest change I could make that would have the desired effect and I hope it stimulates a conversation.

pivotal-issuemaster commented 4 years ago

@stefanmoser Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

michaelklishin commented 4 years ago

This is on the right track, I'll have to check what value we should return to get more sensible logging.

stefanmoser commented 4 years ago

From my testing, there is still full logging within the AWS peer discovery plugin of what happened (why the API call failed), along with logging in core that retries will happen. Do other peer discovery plugins provide additional logging or error messages back to core?

michaelklishin commented 4 years ago

@stefanmoser the returned value is returned from rabbitmq_peer_discovery_aws:list_nodes/0, which is used by rabbit_mnesia:run_peer_discovery_with_retries/2, and that function expects errors to be reported as {error, term()}.

michaelklishin commented 4 years ago

@stefanmoser can you please try #40 out and see if it works as expected in your tests?