Closed FlyersWeb closed 4 years ago
Unfortunately, I cannot reproduce this issue but I have some suspect.
My guess is, for some reason, your broker does not report few value-limits in the node statistics.
I have an experimental build which I am attaching, could you please test it and confirm it solves your issue?
Hi, unfortunately I tried the experimental build but still have same issue.
FYI when I remove the node metric I get the following error :
2019-05-16 07:44:43.002 [error] <0.613.0> Supervisor {<0.613.0>,'Elixir.Supervisor.Default'} had child 'Elixir.RabbitMQ.CloudWatchExporter.Exporter' started with 'Elixir.RabbitMQ.CloudWatchExporter.Exporter':start_link([]) at <0.614.0> exit with reason #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.13.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.14.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>77d25b18-77ae-11e...">>} in 'Elixir.ExAws':'request!'/2 line 66 in context child_terminated
And it seems that when I remove the exchange metric there is no more issue.
Can you please try with the node
metric and without the exchange
one? I want to first make sure I pinpointed the actual issue, then I will propagate the fix to all metrics.
it does work with the node
metric and without the exchange
metric
Then my suspects are correct. I will ensure all the dimensions are forwarded only if available.
Just as a curiosity, what version of RabbitMQ and Erlang are you using?
$ rabbitmqctl status
Status of node rabbitmq@localhost ...
[{pid,12820},
{running_applications,
[{rabbitmq_cloudwatch_exporter,"rabbitmq_cloudwatch_exporter","0.1.0"},
{hackney,"simple HTTP client","1.15.1"},
{certifi,"CA bundle adapted from Mozilla by https://certifi.io","2.5.1"},
{poison,"An incredibly fast, pure Elixir JSON library","3.1.0"},
{ex_aws_cloudwatch,
"Cloudwatch module for https://github.com/ex-aws/ex_aws","2.0.4"},
{ex_aws,"Generic AWS client","2.1.0"},
{logger,"logger","1.8.1"},
{ssl_verify_fun,"SSL verification functions for Erlang\n","1.1.4"},
{elixir,"elixir","1.8.1"},
{idna,"A pure Erlang IDNA implementation","6.0.0"},
{mimerl,"Library to handle mimetypes","1.2.0"},
{metrics,"A generic interface to different metrics systems in Erlang.",
"1.0.1"},
{unicode_util_compat,
"unicode_util compatibility library for Erlang < 20","0.4.1"},
{rabbitmq_management,"RabbitMQ Management Console","3.7.8"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.8"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.8"},
{rabbit,"RabbitMQ","3.7.8"},
{cowboy,"Small, fast, modern HTTP server.","2.2.2"},
{amqp_client,"RabbitMQ AMQP Client","3.7.8"},
{rabbit_common,
"Modules shared by rabbitmq-server and rabbitmq-erlang-client",
"3.7.8"},
{ranch_proxy_protocol,"Ranch Proxy Protocol Transport","1.5.0"},
{ranch,"Socket acceptor pool for TCP protocols.","1.5.0"},
{ssl,"Erlang/OTP SSL application","9.0"},
{public_key,"Public key infrastructure","1.6"},
{os_mon,"CPO CXC 138 46","2.4.5"},
{asn1,"The Erlang ASN1 compiler version 5.0.6","5.0.6"},
{inets,"INETS CXC 138 49","7.0"},
{xmerl,"XML parser","1.3.17"},
{recon,"Diagnostic tools for production use","2.3.2"},
{cowlib,"Support library for manipulating Web protocols.","2.1.0"},
{crypto,"CRYPTO","4.3"},
{jsx,"a streaming, evented json parsing toolkit","2.8.2"},
{mnesia,"MNESIA CXC 138 12","4.15.4"},
{lager,"Erlang logging framework","3.6.3"},
{goldrush,"Erlang event stream processor","0.1.9"},
{compiler,"ERTS CXC 138 10","7.2"},
{syntax_tools,"Syntax tools","2.1.5"},
{syslog,"An RFC 3164 and RFC 5424 compliant logging framework.","3.4.3"},
{sasl,"SASL CXC 138 11","3.2"},
{stdlib,"ERTS CXC 138 10","3.5"},
{kernel,"ERTS CXC 138 10","6.0"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:64] [hipe]\n"}
...
I tried your Erlang and RabbitMQ versions and yet I could not reproduce the issue.
Does error message repeats periodically or sporadically? If your collection period is 60 seconds, does it always crash every 60 seconds or only sometimes?
What is odd is that the message:
The parameter MetricData.member.13.Dimensions.member.3.Value is required.
Indicates that the 13th metric entry of the exchanges list is missing the third dimension parameter. Which means the plugin could not read the VHost name of the given exchange. This is quite odd!
Nevertheless, I went through all the dimension getters ensuring they return undefined
whenever they cannot retrieve the information. This should prevent the collector from crashing, yet I am not fully comfortable with this solution.
Could you please try this new build and tell me if it still has issues?
Hi,
I just had the opportunity to test your fix, but it didn't seems to work, when I add the exchange
metric, I still have the issue :
2019-06-13 12:43:48.765 [error] <0.667.0> CRASH REPORT Process <0.667.0> with 0 neighbours crashed with reason: #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.7.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.8.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>e3e41a38-8dd8-11e9-...">>} in 'Elixir.ExAws':'request!'/2 line 66
I've fixed it by not publishing the exchange
metric as it does not provide any relevant data for me.
Could you please paste me the result of the following command?
curl -u <user>:<password> http://<host>:<port>/api/exchanges
Where <user>
and <password>
are your broker ones. While <host>
and <port>
are the location of your broker management console (localhost
and 15672
if you are running it locally).
Note that the command will show your exchanges layout so if it's a production system make sure you can provide such information.
There is the output (it is not a production environment)
[{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"message_stats":{"publish_in":122,"publish_in_details":{"rate":0.0},"publish_out":122,"publish_out_details":{"rate":0.0}},"name":"","type":"direct","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.direct","type":"direct","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.fanout","type":"fanout","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.headers","type":"headers","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.match","type":"headers","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":true,"name":"amq.rabbitmq.trace","type":"topic","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.topic","type":"topic","user_who_performed_action":"rmq-internal","vhost":"/"}]%
The message above belongs to issue #5 and not this one. Please move it there.
A fix is on the way and I will soon publish a new release.
any updates? i want to try it. but not sure if it works))
Problem resolved in 0.3.1
. See issue #7 for details about the resolution.
When I publish my metrics using the following configuration :
I get an error from ExAws :
For the record, I've only one rabbitmq server, I've removed the node, connection, channel metrics and it worked