Needs more documentation or information please!

Jinkxed commented 6 years ago

I'm setting up a new cluster on AWS and trying to use the aws auto discovery module in 3.7.2.

I've setup the rabbitmq.conf with the information from the documentation:

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

cluster_formation.aws.region = us-east-1
cluster_formation.aws.access_key_id = someid
cluster_formation.aws.secret_key = somekey

cluster_formation.aws.use_autoscaling_group = true

I can see the backend starting in the logs:

Application rabbitmq_peer_discovery_aws started on node

Past that everything is a black box. None of my nodes seem to be joining each other and I have no logs describing any kind of behavior.

Are there commands I can run to test the cluster configuration? Is there a specific cluster log command like there was for autocluster?

Just needs more clarity on "how" to ensure the plugin is doing what it's supposed to.

Thanks!

michaelklishin commented 6 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes two things:

GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team)
We have a certain amount of information to work with

We get at least a dozen of questions through various venues every single day, often quite light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because of that questions, investigations, root cause analysis, discussions of potential features are all considered to be mailing list material by our team. Please post this to rabbitmq-users.

Getting all the details necessary to reproduce an issue, make a conclusion or even form a hypothesis about what's happening can take a fair amount of time. Our team is multiple orders of magnitude smaller than the RabbitMQ community. Please help others help you by providing a way to reproduce the behavior you're observing, or at least sharing as much relevant information as possible on the list:

Server, client library and plugin (if applicable) versions used
Server logs
A code example or terminal transcript that can be used to reproduce
Full exception stack traces (not a single line message)
rabbitmqctl status (and, if possible, rabbitmqctl environment output)
Other relevant things about the environment and workload, e.g. a traffic capture

Feel free to edit out hostnames and other potentially sensitive information.

When/if we have enough details and evidence we'd be happy to file a new issue.

Thank you.

michaelklishin commented 6 years ago

The plugin is covered in the Cluster Formation and Peer Discovery guide. If something isn't clear there, please post your feedback to the mailing list.

This plugin logs a reasonable amount of information at the info log level and a lot more (including HTTP requests it performs to EC2 APIs) at the debug level.

michaelklishin commented 6 years ago

This plugin is not meaningfully different from the AWS part of rabbitmq-autocluster. Just like rabbitmq-autocluster, it is not a replacement for understanding of how RabbitMQ clusters are formed (e.g. how nodes authenticate to each other or how to inspect cluster state).

One important different from the original version of rabbitmq-autocluster is that this plugin will not reset the node on boot. This means that a node that previously started as a standalone one will not attempt to perform discovery, which is documented. If you see absolutely no log entries related to peer discovery at the info level, that's probably what's happening.

See Clustering and CLI tools guides for related topics.

michaelklishin commented 6 years ago

I added a rudimentary troubleshooting section to the cluster formation guide and some specific examples to the logging guide. The plan is to add some specific examples of what various log messages around peer discovery mean once I get a bit more time.

Jinkxed commented 6 years ago

Hey @michaelklishin I very much appreciate it.

The issues I'm having is I have logging set to debug and do not see any ec2 api requests or anything other than

2018-01-09 19:02:28.606 [debug] <0.578.0> Peer discovery: checking for partitioned nodes to clean up.
2018-01-09 19:02:28.606 [debug] <0.578.0> Peer discovery: all known cluster nodes are up.

And only those showed up once I turned on:

cluster_formation.node_cleanup.only_log_warning = false

Example of my rabbitmq.conf

# ======================================
# RabbbitMQ broker section
# ======================================

listeners.tcp.default = 5672

listeners.tcp.local   = 0.0.0.0:5672

vm_memory_high_watermark.relative = 0.8

vm_memory_high_watermark_paging_ratio = 0.75

disk_free_limit.relative = 2.0

mirroring_sync_batch_size = 4096

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_aws

# ignore the format of the variable - it's the actual key/secret in the file
cluster_formation.aws.access_key_id = <%= @aws_access_key -%>

cluster_formation.aws.secret_key = <%= @aws_secret_key -%>

cluster_formation.aws.region = us-east-1

cluster_formation.aws.use_autoscaling_group = true

cluster_formation.aws.use_private_ip = true

cluster_formation.node_cleanup.only_log_warning = false

cluster_keepalive_interval = 10000

proxy_protocol = true

management.listener.port = 15672
management.listener.ip   = 0.0.0.0

log.file = true
log.file.level = debug

# I've tried removing all of these below and still don't see the requests or anything AWS related
log.connection.level = debug
log.channel.level = info
log.queue.level = info

DNS lookups work between nodes. I can manually:

rabbitmqctl stop_app on node2
rabbitmqctl join_cluster rabbit@node1

And it works fine. But I can't seem to get the AWS automatic discovery to work.

Are there any specific tags that need to be added to the autoscaling group? Are there any specific environment variables that need to be set?

Thanks again!

michaelklishin commented 6 years ago

That means the node is not a new one and it doesn't perform peer discovery. It considers itself to be a member of a cluster, confirming my hypothesis above. From the docs:

If peer discovery isn't configured, or it fails, or no peers are reachable,
a node that wasn't a cluster member in the past will initialise from scratch
and proceed as a standalone node.

If a node previously was a cluster member, it will try to contact
its "last seen" peer for a period of time. In this case, no peer discovery will be performed.

This is not a support forum, please move this to the list.

rabbitmq / rabbitmq-peer-discovery-aws

Needs more documentation or information please! #12