rabbitmq / rabbitmq-peer-discovery-k8s

Kubernetes-based peer discovery mechanism for RabbitMQ
Other
295 stars 94 forks source link

Peers discovered but filtered out as non-eligible: k8s endpoint listing returned nodes not yet ready #52

Closed TwitchChen closed 5 years ago

TwitchChen commented 5 years ago

I'm trying to make a rabbitmq cluster witch 2 node by useing the rabbitmq-peer-discovery-k8s.But both of 2 rabbitmq node are running alone.

rabbimq-0's log

2019-09-29 09:47:22.685 [info] <0.8.0> Feature flags: list of feature flags found:
2019-09-29 09:47:22.686 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2019-09-29 09:47:22.742 [info] <0.234.0> 
 Starting RabbitMQ 3.7.18 on Erlang 22.1
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.18. Copyright (C) 2007-2019 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2019-09-29 09:47:22.743 [info] <0.234.0> 
 node           : rabbit@rabbitmq-0
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0
2019-09-29 09:47:22.764 [info] <0.234.0> Running boot step pre_boot defined by app rabbit
2019-09-29 09:47:22.764 [info] <0.234.0> Running boot step rabbit_core_metrics defined by app rabbit
2019-09-29 09:47:22.764 [info] <0.234.0> Running boot step rabbit_alarm defined by app rabbit
2019-09-29 09:47:22.776 [info] <0.240.0> Memory high watermark set to 1907 MiB (2000000000 bytes) of 3790 MiB (3974164480 bytes) total
2019-09-29 09:47:22.804 [info] <0.242.0> Enabling free disk space monitoring
2019-09-29 09:47:22.804 [info] <0.242.0> Disk free limit set to 4000MB
2019-09-29 09:47:22.809 [info] <0.234.0> Running boot step code_server_cache defined by app rabbit
2019-09-29 09:47:22.809 [info] <0.234.0> Running boot step file_handle_cache defined by app rabbit
2019-09-29 09:47:22.809 [info] <0.245.0> Limiting to approx 65436 file handles (58890 sockets)
2019-09-29 09:47:22.810 [info] <0.246.0> FHC read buffering:  OFF
2019-09-29 09:47:22.810 [info] <0.246.0> FHC write buffering: ON
2019-09-29 09:47:22.812 [info] <0.234.0> Running boot step worker_pool defined by app rabbit
2019-09-29 09:47:22.812 [info] <0.235.0> Will use 2 processes for default worker pool
2019-09-29 09:47:22.812 [info] <0.235.0> Starting worker pool 'worker_pool' with 2 processes in it
2019-09-29 09:47:22.812 [info] <0.234.0> Running boot step database defined by app rabbit
2019-09-29 09:47:22.813 [info] <0.234.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-09-29 09:47:22.813 [info] <0.234.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-09-29 09:47:22.813 [info] <0.234.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-09-29 09:47:22.813 [info] <0.234.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-09-29 09:47:22.813 [info] <0.234.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-09-29 09:47:22.846 [info] <0.234.0> k8s endpoint listing returned nodes not yet ready: rabbitmq-0
2019-09-29 09:47:22.846 [info] <0.234.0> All discovered existing cluster peers: 
2019-09-29 09:47:22.846 [info] <0.234.0> Discovered no peer nodes to cluster with
2019-09-29 09:47:22.850 [info] <0.43.0> Application mnesia exited with reason: stopped
2019-09-29 09:47:23.063 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:47:23.104 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:47:23.154 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:47:23.154 [info] <0.234.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping registration.
2019-09-29 09:47:23.154 [info] <0.234.0> Running boot step database_sync defined by app rabbit
2019-09-29 09:47:23.155 [info] <0.234.0> Running boot step feature_flags defined by app rabbit
2019-09-29 09:47:23.155 [info] <0.234.0> Running boot step codec_correctness_check defined by app rabbit
2019-09-29 09:47:23.155 [info] <0.234.0> Running boot step external_infrastructure defined by app rabbit
2019-09-29 09:47:23.155 [info] <0.234.0> Running boot step rabbit_registry defined by app rabbit
2019-09-29 09:47:23.156 [info] <0.234.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2019-09-29 09:47:23.156 [info] <0.234.0> Running boot step rabbit_queue_location_random defined by app rabbit
2019-09-29 09:47:23.156 [info] <0.234.0> Running boot step rabbit_event defined by app rabbit
2019-09-29 09:47:23.156 [info] <0.234.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2019-09-29 09:47:23.156 [info] <0.234.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2019-09-29 09:47:23.157 [info] <0.234.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2019-09-29 09:47:23.157 [info] <0.234.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2019-09-29 09:47:23.157 [info] <0.234.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2019-09-29 09:47:23.157 [info] <0.234.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_priority_queue defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2019-09-29 09:47:23.158 [info] <0.234.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2019-09-29 09:47:23.159 [info] <0.234.0> Running boot step kernel_ready defined by app rabbit
2019-09-29 09:47:23.159 [info] <0.234.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2019-09-29 09:47:23.159 [info] <0.234.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2019-09-29 09:47:23.161 [info] <0.234.0> Running boot step guid_generator defined by app rabbit
2019-09-29 09:47:23.166 [info] <0.234.0> Running boot step rabbit_node_monitor defined by app rabbit
2019-09-29 09:47:23.167 [info] <0.421.0> Starting rabbit_node_monitor
2019-09-29 09:47:23.167 [info] <0.234.0> Running boot step delegate_sup defined by app rabbit
2019-09-29 09:47:23.168 [info] <0.234.0> Running boot step rabbit_memory_monitor defined by app rabbit
2019-09-29 09:47:23.168 [info] <0.234.0> Running boot step core_initialized defined by app rabbit
2019-09-29 09:47:23.168 [info] <0.234.0> Running boot step upgrade_queues defined by app rabbit
2019-09-29 09:47:23.205 [info] <0.234.0> message_store upgrades: 1 to apply
2019-09-29 09:47:23.205 [info] <0.234.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2019-09-29 09:47:23.205 [info] <0.234.0> message_store upgrades: No durable queues found. Skipping message store migration
2019-09-29 09:47:23.205 [info] <0.234.0> message_store upgrades: Removing the old message store data
2019-09-29 09:47:23.206 [info] <0.234.0> message_store upgrades: All upgrades applied successfully
2019-09-29 09:47:23.245 [info] <0.234.0> Running boot step rabbit_connection_tracking defined by app rabbit
2019-09-29 09:47:23.245 [info] <0.234.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2019-09-29 09:47:23.245 [info] <0.234.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2019-09-29 09:47:23.245 [info] <0.234.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2019-09-29 09:47:23.246 [info] <0.234.0> Running boot step rabbit_policies defined by app rabbit
2019-09-29 09:47:23.247 [info] <0.234.0> Running boot step rabbit_policy defined by app rabbit
2019-09-29 09:47:23.247 [info] <0.234.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2019-09-29 09:47:23.247 [info] <0.234.0> Running boot step rabbit_vhost_limit defined by app rabbit
2019-09-29 09:47:23.247 [info] <0.234.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2019-09-29 09:47:23.247 [info] <0.234.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2019-09-29 09:47:23.247 [info] <0.234.0> Management plugin: using rates mode 'basic'
2019-09-29 09:47:23.248 [info] <0.234.0> Running boot step recovery defined by app rabbit
2019-09-29 09:47:23.249 [info] <0.234.0> Running boot step load_definitions defined by app rabbitmq_management
2019-09-29 09:47:23.249 [info] <0.234.0> Running boot step empty_db_check defined by app rabbit
2019-09-29 09:47:23.249 [info] <0.234.0> Adding vhost '/'
2019-09-29 09:47:23.294 [info] <0.462.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2019-09-29 09:47:23.301 [info] <0.462.0> Starting message stores for vhost '/'
2019-09-29 09:47:23.302 [info] <0.466.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2019-09-29 09:47:23.304 [info] <0.462.0> Started message store of type transient for vhost '/'
2019-09-29 09:47:23.304 [info] <0.469.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2019-09-29 09:47:23.305 [warning] <0.469.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2019-09-29 09:47:23.306 [info] <0.462.0> Started message store of type persistent for vhost '/'
2019-09-29 09:47:23.308 [info] <0.234.0> Creating user 'guest'
2019-09-29 09:47:23.313 [info] <0.234.0> Setting user tags for user 'guest' to [administrator]
2019-09-29 09:47:23.317 [info] <0.234.0> Setting permissions for 'guest' in '/' to '.*', '.*', '.*'
2019-09-29 09:47:23.322 [info] <0.234.0> Running boot step rabbit_looking_glass defined by app rabbit
2019-09-29 09:47:23.322 [info] <0.234.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2019-09-29 09:47:23.322 [info] <0.234.0> Running boot step background_gc defined by app rabbit
2019-09-29 09:47:23.323 [info] <0.234.0> Running boot step connection_tracking defined by app rabbit
2019-09-29 09:47:23.331 [info] <0.234.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@rabbitmq-0'
2019-09-29 09:47:23.338 [info] <0.234.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@rabbitmq-0'
2019-09-29 09:47:23.338 [info] <0.234.0> Running boot step routing_ready defined by app rabbit
2019-09-29 09:47:23.338 [info] <0.234.0> Running boot step pre_flight defined by app rabbit
2019-09-29 09:47:23.338 [info] <0.234.0> Running boot step notify_cluster defined by app rabbit
2019-09-29 09:47:23.338 [info] <0.234.0> Running boot step networking defined by app rabbit
2019-09-29 09:47:23.341 [info] <0.515.0> started TCP listener on [::]:5672
2019-09-29 09:47:23.342 [info] <0.234.0> Running boot step direct_client defined by app rabbit
2019-09-29 09:47:23.342 [info] <0.521.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 30 seconds.
2019-09-29 09:47:23.386 [info] <0.575.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-09-29 09:47:23.386 [info] <0.681.0> Statistics database started.
2019-09-29 09:47:23.386 [info] <0.680.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2019-09-29 09:47:23.602 [info] <0.8.0> Server startup complete; 5 plugins started.
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * rabbitmq_peer_discovery_k8s
 * rabbitmq_peer_discovery_common
 completed with 5 plugins.

rabbitmq-1's log

2019-09-29 09:48:26.925 [info] <0.8.0> Feature flags: list of feature flags found:
2019-09-29 09:48:26.925 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2019-09-29 09:48:26.974 [info] <0.234.0> 
 Starting RabbitMQ 3.7.18 on Erlang 22.1
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.18. Copyright (C) 2007-2019 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2019-09-29 09:48:26.975 [info] <0.234.0> 
 node           : rabbit@rabbitmq-1
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-1
2019-09-29 09:48:27.000 [info] <0.234.0> Running boot step pre_boot defined by app rabbit
2019-09-29 09:48:27.000 [info] <0.234.0> Running boot step rabbit_core_metrics defined by app rabbit
2019-09-29 09:48:27.001 [info] <0.234.0> Running boot step rabbit_alarm defined by app rabbit
2019-09-29 09:48:27.008 [info] <0.240.0> Memory high watermark set to 1907 MiB (2000000000 bytes) of 3790 MiB (3974975488 bytes) total
2019-09-29 09:48:27.015 [info] <0.242.0> Enabling free disk space monitoring
2019-09-29 09:48:27.015 [info] <0.242.0> Disk free limit set to 4000MB
2019-09-29 09:48:27.019 [info] <0.234.0> Running boot step code_server_cache defined by app rabbit
2019-09-29 09:48:27.019 [info] <0.234.0> Running boot step file_handle_cache defined by app rabbit
2019-09-29 09:48:27.020 [info] <0.245.0> Limiting to approx 65436 file handles (58890 sockets)
2019-09-29 09:48:27.020 [info] <0.246.0> FHC read buffering:  OFF
2019-09-29 09:48:27.020 [info] <0.246.0> FHC write buffering: ON
2019-09-29 09:48:27.020 [info] <0.234.0> Running boot step worker_pool defined by app rabbit
2019-09-29 09:48:27.021 [info] <0.235.0> Will use 2 processes for default worker pool
2019-09-29 09:48:27.021 [info] <0.235.0> Starting worker pool 'worker_pool' with 2 processes in it
2019-09-29 09:48:27.021 [info] <0.234.0> Running boot step database defined by app rabbit
2019-09-29 09:48:27.021 [info] <0.234.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-1 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-09-29 09:48:27.021 [info] <0.234.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-09-29 09:48:27.022 [info] <0.234.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-09-29 09:48:27.022 [info] <0.234.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-09-29 09:48:27.022 [info] <0.234.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-09-29 09:48:27.051 [info] <0.234.0> k8s endpoint listing returned nodes not yet ready: rabbitmq-1
2019-09-29 09:48:27.052 [info] <0.234.0> All discovered existing cluster peers: rabbit@rabbitmq-0
2019-09-29 09:48:27.052 [info] <0.234.0> Peer nodes we can cluster with: rabbit@rabbitmq-0
2019-09-29 09:48:33.069 [warning] <0.234.0> Could not auto-cluster with node rabbit@rabbitmq-0: {badrpc,nodedown}
2019-09-29 09:48:33.069 [warning] <0.234.0> Could not successfully contact any node of: rabbit@rabbitmq-0 (as in Erlang distribution). Starting as a blank standalone node...
2019-09-29 09:48:33.077 [info] <0.43.0> Application mnesia exited with reason: stopped
2019-09-29 09:48:33.206 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:48:33.255 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:48:33.303 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 09:48:33.304 [info] <0.234.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping registration.
2019-09-29 09:48:33.304 [info] <0.234.0> Running boot step database_sync defined by app rabbit
2019-09-29 09:48:33.304 [info] <0.234.0> Running boot step feature_flags defined by app rabbit
2019-09-29 09:48:33.304 [info] <0.234.0> Running boot step codec_correctness_check defined by app rabbit
2019-09-29 09:48:33.304 [info] <0.234.0> Running boot step external_infrastructure defined by app rabbit
2019-09-29 09:48:33.304 [info] <0.234.0> Running boot step rabbit_registry defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_queue_location_random defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_event defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2019-09-29 09:48:33.305 [info] <0.234.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2019-09-29 09:48:33.306 [info] <0.234.0> Running boot step rabbit_priority_queue defined by app rabbit
2019-09-29 09:48:33.307 [info] <0.234.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-09-29 09:48:33.307 [info] <0.234.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2019-09-29 09:48:33.307 [info] <0.234.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2019-09-29 09:48:33.307 [info] <0.234.0> Running boot step kernel_ready defined by app rabbit
2019-09-29 09:48:33.307 [info] <0.234.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2019-09-29 09:48:33.307 [info] <0.234.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2019-09-29 09:48:33.308 [info] <0.234.0> Running boot step guid_generator defined by app rabbit
2019-09-29 09:48:33.313 [info] <0.234.0> Running boot step rabbit_node_monitor defined by app rabbit
2019-09-29 09:48:33.313 [info] <0.421.0> Starting rabbit_node_monitor
2019-09-29 09:48:33.313 [info] <0.234.0> Running boot step delegate_sup defined by app rabbit
2019-09-29 09:48:33.314 [info] <0.234.0> Running boot step rabbit_memory_monitor defined by app rabbit
2019-09-29 09:48:33.314 [info] <0.234.0> Running boot step core_initialized defined by app rabbit
2019-09-29 09:48:33.314 [info] <0.234.0> Running boot step upgrade_queues defined by app rabbit
2019-09-29 09:48:33.355 [info] <0.234.0> message_store upgrades: 1 to apply
2019-09-29 09:48:33.355 [info] <0.234.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2019-09-29 09:48:33.356 [info] <0.234.0> message_store upgrades: No durable queues found. Skipping message store migration
2019-09-29 09:48:33.356 [info] <0.234.0> message_store upgrades: Removing the old message store data
2019-09-29 09:48:33.356 [info] <0.234.0> message_store upgrades: All upgrades applied successfully
2019-09-29 09:48:33.402 [info] <0.234.0> Running boot step rabbit_connection_tracking defined by app rabbit
2019-09-29 09:48:33.402 [info] <0.234.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2019-09-29 09:48:33.402 [info] <0.234.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2019-09-29 09:48:33.403 [info] <0.234.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2019-09-29 09:48:33.403 [info] <0.234.0> Running boot step rabbit_policies defined by app rabbit
2019-09-29 09:48:33.404 [info] <0.234.0> Running boot step rabbit_policy defined by app rabbit
2019-09-29 09:48:33.405 [info] <0.234.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2019-09-29 09:48:33.405 [info] <0.234.0> Running boot step rabbit_vhost_limit defined by app rabbit
2019-09-29 09:48:33.405 [info] <0.234.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2019-09-29 09:48:33.405 [info] <0.234.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2019-09-29 09:48:33.405 [info] <0.234.0> Management plugin: using rates mode 'basic'
2019-09-29 09:48:33.405 [info] <0.234.0> Running boot step recovery defined by app rabbit
2019-09-29 09:48:33.407 [info] <0.234.0> Running boot step load_definitions defined by app rabbitmq_management
2019-09-29 09:48:33.407 [info] <0.234.0> Running boot step empty_db_check defined by app rabbit
2019-09-29 09:48:33.407 [info] <0.234.0> Adding vhost '/'
2019-09-29 09:48:33.433 [info] <0.462.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-1/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2019-09-29 09:48:33.439 [info] <0.462.0> Starting message stores for vhost '/'
2019-09-29 09:48:33.440 [info] <0.466.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2019-09-29 09:48:33.441 [info] <0.462.0> Started message store of type transient for vhost '/'
2019-09-29 09:48:33.441 [info] <0.469.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2019-09-29 09:48:33.442 [warning] <0.469.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2019-09-29 09:48:33.443 [info] <0.462.0> Started message store of type persistent for vhost '/'
2019-09-29 09:48:33.445 [info] <0.234.0> Creating user 'guest'
2019-09-29 09:48:33.448 [info] <0.234.0> Setting user tags for user 'guest' to [administrator]
2019-09-29 09:48:33.452 [info] <0.234.0> Setting permissions for 'guest' in '/' to '.*', '.*', '.*'
2019-09-29 09:48:33.455 [info] <0.234.0> Running boot step rabbit_looking_glass defined by app rabbit
2019-09-29 09:48:33.455 [info] <0.234.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2019-09-29 09:48:33.456 [info] <0.234.0> Running boot step background_gc defined by app rabbit
2019-09-29 09:48:33.456 [info] <0.234.0> Running boot step connection_tracking defined by app rabbit
2019-09-29 09:48:33.461 [info] <0.234.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@rabbitmq-1'
2019-09-29 09:48:33.465 [info] <0.234.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@rabbitmq-1'
2019-09-29 09:48:33.465 [info] <0.234.0> Running boot step routing_ready defined by app rabbit
2019-09-29 09:48:33.466 [info] <0.234.0> Running boot step pre_flight defined by app rabbit
2019-09-29 09:48:33.466 [info] <0.234.0> Running boot step notify_cluster defined by app rabbit
2019-09-29 09:48:33.466 [info] <0.234.0> Running boot step networking defined by app rabbit
2019-09-29 09:48:33.468 [info] <0.515.0> started TCP listener on [::]:5672
2019-09-29 09:48:33.468 [info] <0.234.0> Running boot step direct_client defined by app rabbit
2019-09-29 09:48:33.469 [info] <0.521.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 30 seconds.
2019-09-29 09:48:33.520 [info] <0.575.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-09-29 09:48:33.521 [info] <0.681.0> Statistics database started.
2019-09-29 09:48:33.521 [info] <0.680.0> Starting worker pool 'management_worker_pool' with 3 processes in it
 completed with 5 plugins.
2019-09-29 09:48:33.791 [info] <0.8.0> Server startup complete; 5 plugins started.
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * rabbitmq_peer_discovery_k8s
 * rabbitmq_peer_discovery_common

rabbitmq cluster_status

rabbitmq-0
root@rabbitmq-0:/# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbitmq-0 ...
[{nodes,[{disc,['rabbit@rabbitmq-0']}]},
 {running_nodes,['rabbit@rabbitmq-0']},
 {cluster_name,<<"rabbit@rabbitmq-0.rabbitmq-headless-srv.default.svc.cluster.local.">>},
 {partitions,[]},
 {alarms,[{'rabbit@rabbitmq-0',[]}]}]

rabbitmq-1
root@rabbitmq-1:/# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbitmq-1 ...
[{nodes,[{disc,['rabbit@rabbitmq-1']}]},
 {running_nodes,['rabbit@rabbitmq-1']},
 {cluster_name,<<"rabbit@rabbitmq-1.rabbitmq-headless-srv.default.svc.cluster.local.">>},
 {partitions,[]},
 {alarms,[{'rabbit@rabbitmq-1',[]}]}]

rabbitmq_configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: default
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].

  rabbitmq.conf: |
      ## Cluster formation. See https://www.rabbitmq.com/cluster-formation.html to learn more.
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      #cluster_formation.k8s.host = 10.254.0.1
      cluster_formation.k8s.port = 443
      cluster_formation.k8s.scheme = https
      cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
      cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace
      cluster_formation.randomized_startup_delay_range.min = 0
      cluster_formation.randomized_startup_delay_range.max = 2
      # 必须设置service_name,否则Pod无法正常启动,这里设置后可以不设置statefulset下env中的K8S_SERVICE_NAME变量
      cluster_formation.k8s.service_name = rabbitmq-headless-srv
      # 必须设置hostname_suffix,否则节点不能成为集群
      cluster_formation.k8s.hostname_suffix = .rabbitmq-headless-srv.default.svc.cluster.local
      ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
      ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
      ## Set to "hostname" to use pod hostnames.
      ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
      ## environment variable.
      cluster_formation.k8s.address_type = hostname
      ## How often should node cleanup checks run?
      cluster_formation.node_cleanup.interval = 30
      ## Set to false if automatic removal of unknown/absent nodes
      ## is desired. This can be dangerous, see
      ##  * https://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
      ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      ## See https://www.rabbitmq.com/ha.html#master-migration-data-locality
      queue_master_locator=min-masters
      ## See https://www.rabbitmq.com/access-control.html#loopback-users
      loopback_users.guest = false
      #the memory limit
      vm_memory_high_watermark.absolute = 2GB
      #the disk limit
      disk_free_limit.absolute = 4GB

rabbitmq_statefulsets.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: default
spec:
  selector:
    matchLabels:
      app: rabbitmq
  serviceName: rabbitmq-headless-srv
  replicas: 2
  template:
    metadata:
      labels:
        app: rabbitmq
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      containers:
      - name: rabbitmq
        image: rabbitmq:k8s-318
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 0.5
            memory: 1Gi
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
          - name: rabbitmq-pvc
            mountPath: /var/lib/rabbitmq/mnesia
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 60
          # See https://www.rabbitmq.com/monitoring.html for monitoring frequency recommendations.
          periodSeconds: 60
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 20
          periodSeconds: 60
          timeoutSeconds: 10
        imagePullPolicy: IfNotPresent
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "false"
          - name: K8S_SERVICE_NAME
            value: "rabbitmq-headless-srv"
          - name: RABBITMQ_NODENAME
            #value: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
            value: rabbit@$(MY_POD_NAME)
          - name: K8S_HOSTNAME_SUFFIX
            value: ".$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie"
      volumes:
      - name: config-volume
        configMap:
          name: rabbitmq-config
          items:
          - key: rabbitmq.conf
            path: rabbitmq.conf
          - key: enabled_plugins
            path: enabled_plugins
      - name: rabbitmq-pvc
        hostPath: 
          path: /pacloud/k8s/rabbitmq
michaelklishin commented 5 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team).

We get at least a dozen of questions through various venues every single day, often light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because GitHub is a tool our team uses heavily nearly every day, the signal/noise ratio of issues is something we care about a lot.

Please post this to rabbitmq-users.

Thank you.

michaelklishin commented 5 years ago

See How to Troubleshoot Peer Discovery (hint: most decisions are logged at debug level), How Does Peer Discovery Work (and when it is not performed), and finally, some Kubernetes-specific prerequisites.

michaelklishin commented 5 years ago

While not really applicable to Kubernets as the log message says, the range of values in cluster_formation.randomized_startup_delay_range used in your config is very narrow and too unlikely to address the problem it was designed to address. The default range is [5, 60]. With [0, 2] for the range both nodes will effectively boot in parallel.

On an unrelated note, two node clusters are highly discouraged because computing a majority of nodes in case of connectivity loss is impossible. Some features in 3.8 will require a 3+ node cluster.

Remote access for user guest is highly discouraged.

michaelklishin commented 5 years ago

One of the log files contains the following clue:

2019-09-29 09:47:22.846 [info] <0.234.0> k8s endpoint listing returned nodes not yet ready: rabbitmq-0 2019-09-29 09:47:22.846 [info] <0.234.0> All discovered existing cluster peers:

According to the Kubernetes API, the pod of the discover peer is not yet initialised. This is the case when the pods are booting in parallel. See this rabbitmq-users thread, for example. The docs now explicitly warn about this:

Stateless sets are also prone to the natural race condition during initial cluster formation, unlike stateful sets that initialise pods one by one.

Peer discovery mechanism will filter out nodes whose pods are not yet ready (initialised) according to the Kubernetes API. For example, if pod management policy of a stateful set is set to > Parallel, some nodes can be discovered but will not be joined.

It is therefore necessary to use OrderedReady pod management policy for the sets used by RabbitMQ nodes. This policy is used by default by Kubernetes.

TwitchChen commented 5 years ago

@michaelklishin Thank you for your reply.I have a another try without the "cluster_formation.k8s.address_type = hostname" , and other configurations do not change like "cluster_formation.randomized_startup_delay_range".

rabbitmq-0' s log

2019-09-29 02:29:33.786 [info] <0.219.0> 
 Starting RabbitMQ 3.7.16 on Erlang 22.0.7
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.16. Copyright (C) 2007-2019 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2019-09-29 02:29:33.794 [info] <0.219.0> 
 node           : rabbit@172.31.92.92
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@172.31.92.92
2019-09-29 02:29:36.066 [info] <0.219.0> Running boot step pre_boot defined by app rabbit
2019-09-29 02:29:36.066 [info] <0.219.0> Running boot step rabbit_core_metrics defined by app rabbit
2019-09-29 02:29:36.069 [info] <0.219.0> Running boot step rabbit_alarm defined by app rabbit
2019-09-29 02:29:36.078 [info] <0.227.0> Memory high watermark set to 1907 MiB (2000000000 bytes) of 515124 MiB (540147003392 bytes) total
2019-09-29 02:29:36.086 [info] <0.229.0> Enabling free disk space monitoring
2019-09-29 02:29:36.086 [info] <0.229.0> Disk free limit set to 4000MB
2019-09-29 02:29:36.094 [info] <0.219.0> Running boot step code_server_cache defined by app rabbit
2019-09-29 02:29:36.094 [info] <0.219.0> Running boot step file_handle_cache defined by app rabbit
2019-09-29 02:29:36.095 [info] <0.232.0> Limiting to approx 65436 file handles (58890 sockets)
2019-09-29 02:29:36.095 [info] <0.233.0> FHC read buffering:  OFF
2019-09-29 02:29:36.095 [info] <0.233.0> FHC write buffering: ON
2019-09-29 02:29:36.095 [info] <0.219.0> Running boot step worker_pool defined by app rabbit
2019-09-29 02:29:36.095 [info] <0.220.0> Will use 48 processes for default worker pool
2019-09-29 02:29:36.095 [info] <0.220.0> Starting worker pool 'worker_pool' with 48 processes in it
2019-09-29 02:29:36.098 [info] <0.219.0> Running boot step database defined by app rabbit
2019-09-29 02:29:36.098 [info] <0.219.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@172.31.92.92 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-09-29 02:29:36.098 [info] <0.219.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-09-29 02:29:36.098 [info] <0.219.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-09-29 02:29:36.098 [info] <0.219.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-09-29 02:29:36.099 [info] <0.219.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-09-29 02:29:36.129 [info] <0.219.0> All discovered existing cluster peers: rabbit@172.31.92.92
2019-09-29 02:29:36.129 [info] <0.219.0> Discovered no peer nodes to cluster with
2019-09-29 02:29:36.134 [info] <0.43.0> Application mnesia exited with reason: stopped
2019-09-29 02:29:36.179 [info] <0.219.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 02:29:36.221 [info] <0.219.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 02:29:36.261 [info] <0.219.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-09-29 02:29:36.262 [info] <0.219.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping registration.
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step database_sync defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step codec_correctness_check defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step external_infrastructure defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step rabbit_registry defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step rabbit_queue_location_random defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step rabbit_event defined by app rabbit
2019-09-29 02:29:36.262 [info] <0.219.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2019-09-29 02:29:36.263 [info] <0.219.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_priority_queue defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step kernel_ready defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2019-09-29 02:29:36.264 [info] <0.219.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2019-09-29 02:29:36.266 [info] <0.219.0> Running boot step guid_generator defined by app rabbit
2019-09-29 02:29:36.266 [info] <0.219.0> Running boot step rabbit_node_monitor defined by app rabbit
2019-09-29 02:29:36.267 [info] <0.452.0> Starting rabbit_node_monitor
2019-09-29 02:29:36.267 [info] <0.219.0> Running boot step delegate_sup defined by app rabbit
2019-09-29 02:29:36.267 [info] <0.219.0> Running boot step rabbit_memory_monitor defined by app rabbit
2019-09-29 02:29:36.268 [info] <0.219.0> Running boot step core_initialized defined by app rabbit
2019-09-29 02:29:36.268 [info] <0.219.0> Running boot step upgrade_queues defined by app rabbit
2019-09-29 02:29:36.305 [info] <0.219.0> message_store upgrades: 1 to apply
2019-09-29 02:29:36.305 [info] <0.219.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2019-09-29 02:29:36.306 [info] <0.219.0> message_store upgrades: No durable queues found. Skipping message store migration
2019-09-29 02:29:36.306 [info] <0.219.0> message_store upgrades: Removing the old message store data
2019-09-29 02:29:36.306 [info] <0.219.0> message_store upgrades: All upgrades applied successfully
2019-09-29 02:29:36.345 [info] <0.219.0> Running boot step rabbit_connection_tracking defined by app rabbit
2019-09-29 02:29:36.345 [info] <0.219.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2019-09-29 02:29:36.345 [info] <0.219.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2019-09-29 02:29:36.345 [info] <0.219.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2019-09-29 02:29:36.346 [info] <0.219.0> Running boot step rabbit_policies defined by app rabbit
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step rabbit_policy defined by app rabbit
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step rabbit_vhost_limit defined by app rabbit
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2019-09-29 02:29:36.347 [info] <0.219.0> Management plugin: using rates mode 'basic'
2019-09-29 02:29:36.347 [info] <0.219.0> Running boot step recovery defined by app rabbit
2019-09-29 02:29:36.348 [info] <0.219.0> Running boot step load_definitions defined by app rabbitmq_management
2019-09-29 02:29:36.348 [info] <0.219.0> Running boot step empty_db_check defined by app rabbit
2019-09-29 02:29:36.348 [info] <0.219.0> Adding vhost '/'
2019-09-29 02:29:36.354 [info] <0.493.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@172.31.92.92/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2019-09-29 02:29:36.361 [info] <0.493.0> Starting message stores for vhost '/'
2019-09-29 02:29:36.361 [info] <0.497.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2019-09-29 02:29:36.362 [info] <0.493.0> Started message store of type transient for vhost '/'
2019-09-29 02:29:36.362 [info] <0.500.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2019-09-29 02:29:36.363 [warning] <0.500.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2019-09-29 02:29:36.370 [info] <0.493.0> Started message store of type persistent for vhost '/'
2019-09-29 02:29:36.371 [info] <0.219.0> Creating user 'guest'
2019-09-29 02:29:36.372 [info] <0.219.0> Setting user tags for user 'guest' to [administrator]
2019-09-29 02:29:36.373 [info] <0.219.0> Setting permissions for 'guest' in '/' to '.*', '.*', '.*'
2019-09-29 02:29:36.373 [info] <0.219.0> Running boot step rabbit_looking_glass defined by app rabbit
2019-09-29 02:29:36.373 [info] <0.219.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2019-09-29 02:29:36.373 [info] <0.219.0> Running boot step background_gc defined by app rabbit
2019-09-29 02:29:36.374 [info] <0.219.0> Running boot step connection_tracking defined by app rabbit
2019-09-29 02:29:36.375 [info] <0.219.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@172.31.92.92'
2019-09-29 02:29:36.377 [info] <0.219.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@172.31.92.92'
2019-09-29 02:29:36.377 [info] <0.219.0> Running boot step routing_ready defined by app rabbit
2019-09-29 02:29:36.377 [info] <0.219.0> Running boot step pre_flight defined by app rabbit
2019-09-29 02:29:36.377 [info] <0.219.0> Running boot step notify_cluster defined by app rabbit
2019-09-29 02:29:36.377 [info] <0.219.0> Running boot step networking defined by app rabbit
2019-09-29 02:29:36.381 [warning] <0.532.0> Setting Ranch options together with socket options is deprecated. Please use the new map syntax that allows specifying socket options separately from other options.
2019-09-29 02:29:36.383 [info] <0.546.0> started TCP listener on [::]:5672
2019-09-29 02:29:36.383 [info] <0.219.0> Running boot step direct_client defined by app rabbit
2019-09-29 02:29:36.384 [info] <0.552.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 30 seconds.
2019-09-29 02:29:36.462 [info] <0.610.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2019-09-29 02:29:36.462 [info] <0.716.0> Statistics database started.
2019-09-29 02:29:36.462 [info] <0.715.0> Starting worker pool 'management_worker_pool' with 3 processes in it
 completed with 5 plugins.
2019-09-29 02:29:36.687 [info] <0.8.0> Server startup complete; 5 plugins started.
 * rabbitmq_peer_discovery_k8s
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * rabbitmq_management_agent
 * rabbitmq_peer_discovery_common
2019-09-29 02:30:14.075 [info] <0.452.0> node 'rabbit@172.31.92.123' up
2019-09-29 02:30:14.424 [info] <0.452.0> rabbit on node 'rabbit@172.31.92.123' up

rabbitmq_configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: default
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].

  rabbitmq.conf: |
      ## Cluster formation. See https://www.rabbitmq.com/cluster-formation.html to learn more.
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      #cluster_formation.k8s.host = 10.254.0.1
      cluster_formation.k8s.port = 443
      cluster_formation.k8s.scheme = https
      cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
      cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace
      cluster_formation.randomized_startup_delay_range.min = 0
      cluster_formation.randomized_startup_delay_range.max = 2
      # 必须设置service_name,否则Pod无法正常启动,这里设置后可以不设置statefulset下env中的K8S_SERVICE_NAME变量
      cluster_formation.k8s.service_name = rabbitmq-headless-srv
      # 必须设置hostname_suffix,否则节点不能成为集群
      #cluster_formation.k8s.hostname_suffix = .rabbitmq-headless-srv.default.svc.cluster.local
      ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
      ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
      ## Set to "hostname" to use pod hostnames.
      ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
      ## environment variable.
      #cluster_formation.k8s.address_type = hostname
      ## How often should node cleanup checks run?
      cluster_formation.node_cleanup.interval = 30
      ## Set to false if automatic removal of unknown/absent nodes
      ## is desired. This can be dangerous, see
      ##  * https://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
      ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      ## See https://www.rabbitmq.com/ha.html#master-migration-data-locality
      queue_master_locator=min-masters
      ## See https://www.rabbitmq.com/access-control.html#loopback-users
      loopback_users.guest = false
      #the memory limit
      vm_memory_high_watermark.absolute = 2GB
      #the disk limit
      disk_free_limit.absolute = 4GB

rabbitmq_statefulsets.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: default
spec:
  selector:
    matchLabels:
      app: rabbitmq
  serviceName: rabbitmq-headless-srv
  replicas: 2
  template:
    metadata:
      labels:
        app: rabbitmq
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      imagePullSecrets:
      - name: default
      containers:
      - name: rabbitmq
        image: rabbitmq:k8s
        resources:
          limits:
            cpu: 2
            memory: 3Gi
          requests:
            cpu: 0.5
            memory: 1Gi
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
          - name: rabbitmq-pvc
            mountPath: /var/lib/rabbitmq/mnesia
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 60
          # See https://www.rabbitmq.com/monitoring.html for monitoring frequency recommendations.
          periodSeconds: 60
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 20
          periodSeconds: 60
          timeoutSeconds: 10
        imagePullPolicy: IfNotPresent
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: K8S_SERVICE_NAME
            value: "rabbitmq-headless-srv"
          - name: RABBITMQ_NODENAME
            value: rabbit@$(MY_POD_NAME)
          #- name: K8S_HOSTNAME_SUFFIX
          #  value: ".$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie"
      volumes:
      - name: config-volume
        configMap:
          name: rabbitmq-config
          items:
          - key: rabbitmq.conf
            path: rabbitmq.conf
          - key: enabled_plugins
            path: enabled_plugins
      - name: rabbitmq-pvc
        hostPath: 
          path: /pacloud/k8s/rabbitmq 
michaelklishin commented 5 years ago

I'm not sure what the question is.

At the end of the log a peer joins the cluster:

2019-09-29 02:30:14.075 [info] <0.452.0> node 'rabbit@172.31.92.123' up 2019-09-29 02:30:14.424 [info] <0.452.0> rabbit on node 'rabbit@172.31.92.123' up

If you want to see what Kubernetes API endpoint responses return, set log level to debug. Previously initialised (as in data directory) nodes must be reset between clustering attemptes or they will behave as "rejoining nodes" which the docs cover.

For our team GitHub is not a support forum => I am locking this issue. Please iuse the mailing list in the future.