rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.3k stars 3.91k forks source link

k8s peer discovery : rabbitmq pods are booting as a standalone nodes on ipv6 environment. #11312

Closed balajijagtap closed 5 months ago

balajijagtap commented 5 months ago

Describe the bug

on ipv6 only environment rabbitmq nodes are failing to discover the peer nodes, resulting into booting as standalone nodes. Rabbitmq version : 3.13.0 Erlang version : 26.1.1

with rabbitmq 3.12.7 nodes are able to form a cluster.

below is the snippet from logs of rabbitmq pod.

2024-05-24 10:12:11.819221+00:00 [debug] <0.249.0> Peer discovery: backend returned the following configuration:
2024-05-24 10:12:11.819221+00:00 [debug] <0.249.0>   {ok,{['rabbit@rabbit-crmq-0','rabbit@rabbit-crmq-2','rabbit@rabbit-crmq-1'],
2024-05-24 10:12:11.819221+00:00 [debug] <0.249.0>        disc}}
2024-05-24 10:12:11.819312+00:00 [debug] <0.249.0> Peer discovery: peer node arguments: #{args =>
2024-05-24 10:12:11.819312+00:00 [debug] <0.249.0>                                            ["-boot","start_sasl","-hidden"],
2024-05-24 10:12:11.819312+00:00 [debug] <0.249.0>                                        name => "rabbit-4803-32"}
2024-05-24 10:12:12.061147+00:00 [debug] <0.249.0> Peer discovery: using temporary hidden node 'rabbit-4803-32@rabbit-crmq-2' to query discovered peers properties
2024-05-24 10:12:12.070067+00:00 [debug] <37467.100.0> Peer discovery: failed to query cluster members of node 'rabbit@rabbit-crmq-0': {error,{erpc,noconnection}}
2024-05-24 10:12:12.070067+00:00 [debug] <37467.100.0> Peer discovery: node 'rabbit@rabbit-crmq-0' excluded from the discovered nodes
2024-05-24 10:12:12.070085+00:00 [debug] <37467.100.0> Peer discovery: failed to query cluster members of node 'rabbit@rabbit-crmq-1': {error,{erpc,noconnection}}
2024-05-24 10:12:12.070085+00:00 [debug] <37467.100.0> Peer discovery: node 'rabbit@rabbit-crmq-1' excluded from the discovered nodes
2024-05-24 10:12:12.070679+00:00 [debug] <37467.100.0> Peer discovery: sorted list of nodes and their properties considered to create/sync the cluster:
2024-05-24 10:12:12.070679+00:00 [debug] <37467.100.0>   - {'rabbit@rabbit-crmq-2',['rabbit@rabbit-crmq-2'],1716545486149096,true}
2024-05-24 10:12:12.071459+00:00 [debug] <0.249.0> Peer discovery: not satisfyied with discovered peers: the list should contain at least two nodes with a configured cluster size hint of 3 nodes
2024-05-24 10:12:12.071506+00:00 [error] <0.249.0> Peer discovery: could not discover and join another node; proceeding as a standalone node

Reproduction steps

  1. configure rabbitmq to use ipv6 for inter-node communication.
    
    # these flags will be used by RabbitMQ nodes
    RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp"
    # these flags will be used by CLI tools
    RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"

include the following line in /etc/rabbitmq/erl_inetrc :

{inet6,true}.

2. Deploy the rabbitmq as a statefulset kind with 3 or more replicas.
3. check the cluster status after exec into any pod. 

bash-4.4$ rabbitmqctl cluster_status Cluster status of node rabbit@rabbit-crmq-0 ...

Listeners

Node: rabbit@rabbit-crmq-0, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@rabbit-crmq-0, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@rabbit-crmq-0, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0



### Expected behavior

Rabbitmq pods should form a cluster.  and cluster status check from any rabbitmq node should display all peer nodes as a listener. 

### Additional context

_No response_
mkuratczyk commented 5 months ago

https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.13.2

mkuratczyk commented 5 months ago

Let us know if 3.13.2 works for you but it should. Also, I'd love to know why you'd picked 3.13.0 when 3.13.2 is available. :)

balajijagtap commented 5 months ago

Hi @mkuratczyk . 3.13.2 worked for me. Sorry, I faced this issue before 3.13.2 release and I missed checking the bug fixes in 3.13.2. Thank you for the prompt response !!

michaelklishin commented 5 months ago

FTR, there are some pending (not yet released) changes for 3.13.3 related to peer discovery, too.