rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.26k stars 3.91k forks source link

Metrics GC server terminating #1167

Closed adfinlay closed 7 years ago

adfinlay commented 7 years ago

We are running a single-node RabbitMQ instance on Ubuntu 16.04, Erlang 18.3 and everything was running perfectly on v3.6.6. We upgraded to 3.6.8 today and started seeing terminations for reasons we couldn't identify. We then upgraded to 3.6.9 and saw the same issue. Downgrading back to 3.6.6 has removed the issue again.

The only errors in the log file are along the following lines:

=ERROR REPORT==== 30-Mar-2017::15:24:25 ===
** Generic server rabbit_core_metrics_gc terminating
** Last message in was start_gc
** When Server state == {state,#Ref<0.0.524289.63738>,120000}
** Reason for termination ==
** {badarg,[{erlang,node,[none],[]},
            {rabbit_misc,is_process_alive,1,
                         [{file,"src/rabbit_misc.erl"},{line,872}]},
            {rabbit_core_metrics_gc,gc_process,3,
                                    [{file,"src/rabbit_core_metrics_gc.erl"},
                                     {line,105}]},
            {lists,foldl,3,[{file,"lists.erl"},{line,1262}]},
            {ets,do_foldl,4,[{file,"ets.erl"},{line,585}]},
            {ets,foldl,3,[{file,"ets.erl"},{line,574}]},
            {rabbit_core_metrics_gc,gc_connections,0,
                                    [{file,"src/rabbit_core_metrics_gc.erl"},
                                     {line,63}]},
            {rabbit_core_metrics_gc,handle_info,2,
                                    [{file,"src/rabbit_core_metrics_gc.erl"},
                                     {line,42}]}]}

=ERROR REPORT==== 30-Mar-2017::15:24:25 ===
** Generic server rabbit_mgmt_gc terminating
** Last message in was start_gc
** When Server state == {state,#Ref<0.0.524289.63741>,120000}
** Reason for termination ==
** {badarg,[{erlang,node,[none],[]},
            {rabbit_misc,is_process_alive,1,
                         [{file,"src/rabbit_misc.erl"},{line,872}]},
            {rabbit_mgmt_gc,gc_process,3,
                            [{file,"src/rabbit_mgmt_gc.erl"},{line,136}]},
            {lists,foldl,3,[{file,"lists.erl"},{line,1262}]},
            {ets,do_foldl,4,[{file,"ets.erl"},{line,585}]},
            {ets,foldl,3,[{file,"ets.erl"},{line,574}]},
            {rabbit_mgmt_gc,handle_info,2,
                            [{file,"src/rabbit_mgmt_gc.erl"},{line,44}]},
            {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,615}]}]}
michaelklishin commented 7 years ago

There is no evidence that RabbitMQ terminates. It's a single metric GC (removal) process. This was earlier reported in https://github.com/rabbitmq/rabbitmq-management-agent/issues/42. Please provide your config there.

michaelklishin commented 7 years ago

Starting with 3.6.7 RabbitMQ has a completely different metric storage and management plugin.

adfinlay commented 7 years ago

The config file is pretty bare:

[
{rabbit,
[
        {vm_memory_high_watermark, 0.65}
]
},
{rabbitmq_web_stomp, [{port, 15674}]}
].

When these errors appear, RabbitMQ stops accepting connections and current connections are closed, I assumed that those errors were related as they are the only thing that happens at the time of the issue. The management interface is also unreachable when this happens.