rabbitmq / rabbitmq-autocluster

RabbitMQ peer discovery and cluster formation plugin, supports RabbitMQ 3.6.x
BSD 3-Clause "New" or "Revised" License
242 stars 54 forks source link

autocluster: Step maybe_cluster failed with failure: inconsistent_cluster #62

Closed rfancn closed 6 years ago

rfancn commented 6 years ago

while I doing load testing against 3 node rabbitmq cluster in kubernetes env, one rabbitmq pod(10.244.1.40) got restarted, and failed to rejoin the cluster.

Below is the logs it reported, which complained that "Node 'rabbit@10.244.2.37' thinks it's clustered with node 'rabbit@10.244.1.40', but 'rabbit@10.244.1.40' disagrees"

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Running step find_best_node_to_join
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/wxgigo/endpoints/rabbitmq
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
                             [{"date","Wed, 15 Nov 2017 15:16:55 GMT"},
                              {"content-length","1024"},
                              {"content-type","application/json"}],
                             "{\"kind\":\"Endpoints\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"rabbitmq\",\"namespace\":\"wxgigo\",\"selfLink\":\"/api/v1/namespaces/wxgigo/endpoints/rabbitmq\",\"uid\":\"211b434a-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"196162\",\"creationTimestamp\":\"2017-11-15T14:49:06Z\",\"labels\":{\"app\":\"rabbitmq\"}},\"subsets\":[{\"addresses\":[{\"ip\":\"10.244.2.37\",\"hostname\":\"rabbitmq-0\",\"nodeName\":\"kube-node1\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-0\",\"uid\":\"21508476-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"193957\"}},{\"ip\":\"10.244.2.38\",\"hostname\":\"rabbitmq-2\",\"nodeName\":\"kube-node1\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-2\",\"uid\":\"3a7c7d39-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"194047\"}}],\"notReadyAddresses\":[{\"ip\":\"10.244.1.40\",\"hostname\":\"rabbitmq-1\",\"nodeName\":\"kube-node2\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-1\",\"uid\":\"2dd9f567-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"196160\"}}],\"ports\":[{\"name\":\"amqp\",\"port\":5672,\"protocol\":\"TCP\"}]}]}\n"}}]
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: k8s endpoint listing returned nodes not yet ready: 10.244.1.40
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: List of registered nodes retrieved from the backend: ['rabbit@10.244.2.38',
                                                                   'rabbit@10.244.2.37']

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Fetching node details. Unreachable nodes (or nodes that responded with an error): []
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Fetching node details. Responses: [{candidate_seed_node,
                                                 'rabbit@10.244.2.38',1620526,
                                                 true,
                                                 ['rabbit@10.244.2.38',
                                                  'rabbit@10.244.2.37',
                                                  'rabbit@10.244.1.40'],
                                                 ['rabbit@10.244.2.37',
                                                  'rabbit@10.244.2.38'],
                                                 [],[]},
                                                {candidate_seed_node,
                                                 'rabbit@10.244.2.37',1663139,
                                                 true,
                                                 ['rabbit@10.244.2.38',
                                                  'rabbit@10.244.2.37',
                                                  'rabbit@10.244.1.40'],
                                                 ['rabbit@10.244.2.38',
                                                  'rabbit@10.244.2.37'],
                                                 [],[]}]

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Asked to choose preferred node from the list of: [{candidate_seed_node,
                                                                'rabbit@10.244.2.38',
                                                                1620526,true,
                                                                ['rabbit@10.244.2.38',
                                                                 'rabbit@10.244.2.37',
                                                                 'rabbit@10.244.1.40'],
                                                                ['rabbit@10.244.2.37',
                                                                 'rabbit@10.244.2.38'],
                                                                [],[]},
                                                               {candidate_seed_node,
                                                                'rabbit@10.244.2.37',
                                                                1663139,true,
                                                                ['rabbit@10.244.2.38',
                                                                 'rabbit@10.244.2.37',
                                                                 'rabbit@10.244.1.40'],
                                                                ['rabbit@10.244.2.38',
                                                                 'rabbit@10.244.2.37'],
                                                                [],[]}]
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Filtered node list (does not include us and non-running/reachable nodes): [{candidate_seed_node,
                                                                                         'rabbit@10.244.2.37',
                                                                                         1663139,
                                                                                         true,
                                                                                         ['rabbit@10.244.2.38',
                                                                                          'rabbit@10.244.2.37',
                                                                                          'rabbit@10.244.1.40'],
                                                                                         ['rabbit@10.244.2.38',
                                                                                          'rabbit@10.244.2.37'],
                                                                                         [],
                                                                                         []},
                                                                                        {candidate_seed_node,
                                                                                         'rabbit@10.244.2.38',
                                                                                         1620526,
                                                                                         true,
                                                                                         ['rabbit@10.244.2.38',
                                                                                          'rabbit@10.244.2.37',
                                                                                          'rabbit@10.244.1.40'],
                                                                                         ['rabbit@10.244.2.37',
                                                                                          'rabbit@10.244.2.38'],
                                                                                         [],
                                                                                         []}]

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Picked node as the preferred choice for joining: 'rabbit@10.244.2.37'
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Running step maybe_cluster
=ERROR REPORT==== 15-Nov-2017::15:16:55 ===
Node 'rabbit@10.244.2.37' thinks it's clustered with node 'rabbit@10.244.1.40', but 'rabbit@10.244.1.40' disagrees
=ERROR REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Step maybe_cluster failed, will conitnue nevertheless. Failure reason: Failed to cluster with rabbit@10.244.2.37: {inconsistent_cluster,[78,111,100,101,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,50,46,51,55,39,32,116,104,105,110,107,115,32,105,116,39,115,32,99,108,117,115,116,101,114,101,100,32,119,105,116,104,32,110,111,100,101,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,49,46,52,48,39,44,32,98,117,116,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,49,46,52,48,39,32,100,105,115,97,103,114,101,101,115]}.

Later, I cluster 10.244.1.40 with 10.244.2.37 successfully. So what's could be the possible reason? Is it possible rabbitmq cluster still not kick off the bad node(CLEANUP) from it's cluster info when the node try to rejoin again? Is it reduce the value of CLEANUP_INTERVAL helpful?

michaelklishin commented 6 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes two things:

  1. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team)
  2. We have a certain amount of information to work with

We get at least a dozen of questions through various venues every single day, often quite light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because of that questions, investigations, root cause analysis, discussions of potential features are all considered to be mailing list material by our team. Please post this to rabbitmq-users.

Getting all the details necessary to reproduce an issue, make a conclusion or even form a hypothesis about what's happening can take a fair amount of time. Our team is multiple orders of magnitude smaller than the RabbitMQ community. Please help others help you by providing a way to reproduce the behavior you're observing, or at least sharing as much relevant information as possible on the list:

Feel free to edit out hostnames and other potentially sensitive information.

When/if we have enough details and evidence we'd be happy to file a new issue.

Thank you.

michaelklishin commented 6 years ago

This generally means that node A tried to cluster with B and that failed. Two specific cases which return this error is:

michaelklishin commented 6 years ago

The error is formatted as a list of ASCII numbers but it says

Node 'rabbit@10.244.2.37' thinks it's clustered with node 'rabbit@10.244.1.40', but 'rabbit@10.244.1.40' disagrees

This can happen when you cluster A and B, then stop them, reset B, start both them back up (or similar).

rfancn commented 6 years ago

It uses the latest 3.6.14 version, 3 pods are the replicated one, so there should not be caused by mismatched version issue. I think the problem comes from cleanup logic. When bad node exhaust system resources during high load, it (10.244.1.40) failed or got slow communication with other nodes, somehow the cleanup logic triggered on this bad node, which may execute forget peer node action on bad node before it restart. And at this time, other nodes still think bad node in cluster disc。This looks like the inconsistent cluster example in rabbitmq official cluster document. I will disable AUTOCLUSTER_CLEANUP feature and test again to verify that tomorrow. Anyway, thanks for your comments.

Gsantomaggio commented 6 years ago

@rfancn Can I ask if you are using one of the examples or another script?

rfancn commented 6 years ago

@Gsantomaggio I use the statefulset example

rfancn commented 6 years ago

I did test again, and found following logs which may verify my conclusion.

=ERROR REPORT==== 16-Nov-2017::05:32:24 ===
** Node 'rabbit@10.244.1.48' not responding **
** Removing (timedout) connection **
=INFO REPORT==== 16-Nov-2017::05:32:24 ===
rabbit on node 'rabbit@10.244.1.48' down
=INFO REPORT==== 16-Nov-2017::05:32:24 ===
node 'rabbit@10.244.1.48' down: net_tick_timeout
=INFO REPORT==== 16-Nov-2017::05:32:36 ===
autocluster: (cleanup) checking cluster
=INFO REPORT==== 16-Nov-2017::05:32:36 ===
autocluster: (cleanup) Checking for partitioned nodes.
=INFO REPORT==== 16-Nov-2017::05:32:36 ===
autocluster: (cleanup) Unreachable RabbitMQ nodes ['rabbit@10.244.1.48']
=ERROR REPORT==== 16-Nov-2017::05:32:39 ===
autocluster: (cleanup) removing 'rabbit@10.244.1.48' from cluster
=INFO REPORT==== 16-Nov-2017::05:32:36 ===
autocluster: GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/wxgigo/endpoint
...

EVENT SEQUENCE

WORKAROUND

set AUTOCLUSTER_FAILURE to be "stop" when autocluster failed to join the cluster, which will cause the k8s liveness probing failure, and in turn restart the pod, hopefully, it can join the cluster now because cleanup logic with 60s interval already removed the crashed node from cluster info in other nodes, if this is not the case, the node will continue restart again and again, util it can join the cluster.

NEW ISSUE

In the load testing, I simulate sending tens of thousands of messages to rabbitmq server, and triggered the crash as above. But when the rabbitmq pod restarted, it will cause the hang of the node it running on, and it will report errors as below:

=INFO REPORT==== 16-Nov-2017::05:32:36 ===
autocluster: GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/wxgigo/endpoint                                                                                                                                          s/rabbitmq
=INFO REPORT==== 16-Nov-2017::05:32:39 ===
autocluster: Response: [{error,
                            {failed_connect,
                                [{to_address,
                                     {"kubernetes.default.svc.cluster.local",
                                      443}},
                                 {inet,[inet],econnrefused}]}}]
=INFO REPORT==== 16-Nov-2017::05:32:39 ===
autocluster: HTTP Error {failed_connect,
                            [{to_address,
                                 {"kubernetes.default.svc.cluster.local",443}},
                             {inet,[inet],econnrefused}]}
Error in log handler
====================
Event: {info_msg,<0.136.0>,
           {<0.139.0>,"autocluster: Failed to get nodes from k8s - ~s~n",
            [{failed_connect,
                 [{to_address,{"kubernetes.default.svc.cluster.local",443}},
                  {inet,[inet],econnrefused}]}]}}
Error: badarg
Stack trace: [{io_lib,format,
                  ["autocluster: Failed to get nodes from k8s - ~s~n",
                   [{failed_connect,
                        [{to_address,
                             {"kubernetes.default.svc.cluster.local",443}},
                         {inet,[inet],econnrefused}]}]],
                  [{file,"io_lib.erl"},{line,168}]},
              {rabbit_error_logger,publish1,4,
                  [{file,"src/rabbit_error_logger.erl"},{line,108}]},
              {rabbit_error_logger,handle_event0,2,
                  [{file,"src/rabbit_error_logger.erl"},{line,80}]},
              {rabbit_error_logger_file_h,safe_handle_event,3,
                  [{file,"src/rabbit_error_logger_file_h.erl"},{line,121}]},
              {gen_event,server_update,4,[{file,"gen_event.erl"},{line,533}]},
              {gen_event,server_notify,4,[{file,"gen_event.erl"},{line,515}]},
              {gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,256}]},
              {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]
=INFO REPORT==== 16-Nov-2017::05:32:39 ===
ERROR: "autocluster: Failed to get nodes from k8s - ~s~n" - [{failed_connect,
                                                              [{to_address,
                                                                {"kubernetes.default.svc.cluster.lo                                                                                                                                          cal",
                                                                 443}},
                                                               {inet,
                                                                [inet],
                                                                econnrefused}]}]
=INFO REPORT==== 16-Nov-2017::05:32:39 ===
autocluster: (cleanup) autocluster_k8s returned error {failed_connect,
                                                       [{to_address,
                                                         {"kubernetes.default.svc.cluster.local",
                                                          443}},
                                                        {inet,
                                                         [inet],
                                                         econnrefused}]}
=INFO REPORT==== 16-Nov-2017::06:09:36 ===
ERROR: "autocluster: Failed to get nodes from k8s - ~s~n" - [{failed_connect,
                                                             [{to_address,
                                                               {"kubernetes.default.svc.cluster.local",
                                                                443}},
                                                              {inet,
                                                               [inet],
                                                               nxdomain}]}]
=ERROR REPORT==== 16-Nov-2017::06:09:36 ===
autocluster: Step find_best_node_to_join failed, halting startup. Failure reason: Failed to fetch list of nodes from the backend: {failed_connect,
                                                [{to_address,
                                                  {"kubernetes.default.svc.cluster.local",
                                                   443}},
                                                 {inet,[inet],nxdomain}]}.
=CRASH REPORT==== 16-Nov-2017::06:09:36 ===
 crasher:
   initial call: application_master:init/4
   pid: <0.4429.0>
   registered_name: []
   exception exit: {bad_return,
                       {{rabbit,start,[normal,[]]},
                        {'EXIT',
                            {error,
                                "Failed to fetch list of nodes from the backend: {failed_connect,\n                                                 [{to_address,\n                                                   {\"kubernetes.default.svc.cluster.local\",\n                                                    443}},\n                                                  {inet,[inet],nxdomain}]}"}}}}
     in function  application_master:init/4 (application_master.erl, line 134)
   ancestors: [<0.4428.0>]
   messages: [{'EXIT',<0.4430.0>,normal}]
   links: [<0.4428.0>,<0.31.0>]
   dictionary: []
   trap_exit: true
   status: running
   heap_size: 2586
   stack_size: 27
   reductions: 98
 neighbours:
=INFO REPORT==== 16-Nov-2017::06:09:36 ===
   application: rabbit
   exited: {bad_return,
               {{rabbit,start,[normal,[]]},
                {'EXIT',
                    {error,
                        "Failed to fetch list of nodes from the backend: {failed_connect,\n                                                 [{to_address,\n                                                   {\"kubernetes.default.svc.cluster.local\",\n                                                    443}},\n                                                  {inet,[inet],nxdomain}]}"}}}}
   type: transient
<0.4314.0>#015
{badarg,[{ets,lookup,[,{application_controller,get_env,2,[{file,"application_controller.erl"},{line,332}]},{rabbit_log,catlevel,1,[{file,"src/rabbit_log.erl"},{line,68}]},{rabbit_log,log,4,[{file,"src/rabbit_log.erl"},{line,37}]}]}#015
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{error,\"Failed to fetch list of nodes from the backend: {failed_connect,\n                                                 [{to_address,\n                                                   {\\"kubernetes.default.svc.cluster.local\\",\n                                                    443}},\n                                                  {inet,[inet],nxdomain}]}\"}}}}}"}#015
Crash dump is being written to: erl_crash.dump...
Crash dump is being written to: erl_crash.dump...

From above logs, when autocluster failed to get k8s rabbitmq endpoints for some reason, which in fact failed to fetch list of nodes from the backend, it will crash the rabbit application. I assume this is the expected behavior when set AUTOCLUSTER_FAILURE to be "stop", but crash dump will cause long time to be finished also seems consume much of system resource while dumping, so is there any consideration to add switch to control generating dump or not, because previous debug message is enough to identify the issue, no dump file needed in this case.