rabbitmq / rabbitmq-peer-discovery-aws

AWS-based peer discovery backend for RabbitMQ 3.7.0+
Other
24 stars 11 forks source link

Startup race condition? #14

Closed gdw2 closed 6 years ago

gdw2 commented 6 years ago

I've been deploying RMQ repeatidly and have noticed that the nodes don't always cluster. I suspect that its simply due to the race condition described in the docs:

Race Conditions During Initial Cluster Formation Consider a deployment where the entire cluster is provisioned at once and all nodes start in parallel. In this case there's a natural race condition between node registration and more than one node can become "first to register" (discovers no existing peers and thus starts as standalone).

Different peer discovery backends use different approaches to minimize the probability of such scenario. Some use locking (etcd, Consul), others use a technique known as randomized startup delay. With randomized startup delay nodes will delay their startup for a randomly picked value (between 5 and 60 seconds by default).

Some backends (config file, DNS) rely on a pre-configured set of peers and avoid the issue that way.

Effective delay interval, if used, is logged on node boot.

Interestingly, it doesn't mention what strategy the AWS plugin uses.

When I look at my logs, I see this:

2018-01-19 19:20:31.835 [info] <0.189.0> Peer discovery backend rabbit_peer_discovery_aws does not support registration, skipping randomized startup delay.

I don't know exactly what is meant by "registration", but I would think the lack of support for it would warrant a startup delay? I can of course inject my own startup delay, but am mostly writing this ticket to ask if the AWS plugin makes any attempts to mitigate startup race conditions?

gdw2 commented 6 years ago

Looking through other issues, I see that this type of question is preferred to be discussed in the mailing list. https://github.com/rabbitmq/rabbitmq-peer-discovery-aws/issues/8#issuecomment-355834363

If so, feel free to close, and I apologize.

michaelklishin commented 6 years ago

The Race Conditions During Initial Cluster Formation section in the docs answers this exact question.

michaelklishin commented 6 years ago

I tried clarifying what peer discovery backend registration is. Let me know if it's good enough in your opinion.

gdw2 commented 6 years ago

Looks good, thx.