Closed lukebakken closed 2 months ago
I think the expected behavior should be "the operation is retried N times" :)
We stumbled over this by user error in #10100 and as requested, here is the step by step to get the same error message. Although, bear in mind that this happened to me only because I forgot the "rabbit@" when trying to call join_cluster
:
$ docker network create test_network
1947438e01b9cced503ba3044be1afb1f5a6225fb64d265257b3547b947cad64
$ docker run -d --network test_network --name rabbit1 --privileged -v $(pwd)/cookie:/var/lib/rabbitmq/.erlang.cookie pivotalrabbitmq/rabbitmq:main-otp-max-bazel
b29a66ec3350cb7ee60975d3a1b8c0bd7918313f30833be76a113d0ea0c78590
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b29a66ec3350 pivotalrabbitmq/rabbitmq:main-otp-max-bazel "docker-entrypoint.s…" 38 seconds ago Up 36 seconds 1883/tcp, 4369/tcp, 5551-5552/tcp, 5671-5672/tcp, 8883/tcp, 15670-15676/tcp, 15691-15692/tcp, 25672/tcp, 61613-61614/tcp rabbit1
$ docker exec -it b2 /bin/bash
root@b29a66ec3350:/# rabbitmqctl join_cluster this_node_does_not_exist
Clustering node rabbit@b29a66ec3350 with this_node_does_not_exist
13:03:53.487 [error] Feature flags: error while running:
Feature flags: rabbit_ff_controller:running_nodes[]
Feature flags: on node `this_node_does_not_exist@b29a66ec3350`:
Feature flags: exception error: {erpc,noconnection}
Feature flags: in function erpc:call/5 (erpc.erl, line 710)
Feature flags: in call from rabbit_ff_controller:rpc_call/5 (rabbit_ff_controller.erl, line 1377)
Feature flags: in call from rabbit_ff_controller:list_nodes_clustered_with/1 (rabbit_ff_controller.erl, line 477)
Feature flags: in call from rabbit_ff_controller:check_node_compatibility_task/2 (rabbit_ff_controller.erl, line 389)
Feature flags: in call from rabbit_db_cluster:can_join/1 (rabbit_db_cluster.erl, line 65)
Feature flags: in call from rabbit_db_cluster:join/2 (rabbit_db_cluster.erl, line 97)
Feature flags: in call from erpc:execute_call/4 (erpc.erl, line 589)
Error:
{:aborted_feature_flags_compat_check, {:error, {:erpc, :noconnection}}}
root@b29a66ec3350:/#
It's not clear to me from this log what exactly logs this message: the node or the shell where rabbitmqctl join_cluster this_node_does_not_exist
is executed?
In any case, join_cluster
should bail early if it cannot contact its not-to-be-joint.
I don't know if you checked the log on the node that is running when you try to connect, but it's worth checking.
What may be wrong is your /var/lib/rabbitmq/.erlang.cookie
, it has to be the same (with the same value) on all nodes in the cluster.
@CarvalhoRod thank you for chiming in but this is RabbitMQ 101 and @lukebakken is a core team engineer. You can be sure such basics were accounted for.
That said, with https://github.com/rabbitmq/rabbitmq-server/pull/8411 this can probably be closed. If we get more details/observe more specific failure scenarios that are specific to the code and not the setup, we can always file a new issue.
Setting the milestone to 3.13.7
because that's the most recent 3.13.x
release at the time of writing.
Note that the relevant PR was reverted in https://github.com/rabbitmq/rabbitmq-server/pull/11507, I will unset the milestone to reduce confusion.
Describe the bug
Logs
``` 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: on node `rabbit@rabbit2`: 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: exception error: {erpc,noconnection} 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in function erpc:call/5 (erpc.erl, line 710) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from rabbit_ff_controller:rpc_call/5 (rabbit_ff_controller.erl, line 1123) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from lists:foreach_1/2 (lists.erl, line 1442) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from rabbit_feature_flags:check_node_compatibility_v1/2 (rabbit_feature_flags.erl, line 1599) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from rabbit_mnesia:check_rabbit_consistency/2 (rabbit_mnesia.erl, line 1017) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from rabbit_mnesia:check_consistency/5 (rabbit_mnesia.erl, line 948) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from rabbit_mnesia:check_cluster_consistency/2 (rabbit_mnesia.erl, line 746) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: in call from lists:foldl/3 (lists.erl, line 1350) 2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> 2023-05-24 01:39:55.243345-07:00 [error] <0.277.0> Mnesia(rabbit@rabbit3): ** ERROR ** Mnesia on rabbit@rabbit3 could not connect to node(s) [rabbit@rabbit2] ```Reproduction steps
See above.
Expected behavior
No
erpc
error - either it is re-tried, or it is not tried until disterl is definitely up and running.Additional context
Observed in the following situations: