noxdafox / rabbitmq-cloudwatch-exporter

RabbitMQ Plugin for publishing cluster metrics to AWS CloudWatch
Mozilla Public License 2.0
41 stars 9 forks source link

Issue when publishing node metrics #2

Closed FlyersWeb closed 4 years ago

FlyersWeb commented 5 years ago

When I publish my metrics using the following configuration :

[{rabbitmq_cloudwatch_exporter, 
  [{aws, [{access_key_id, "***"},
          {secret_access_key, "***"},
          {region, "eu-west-3"}]},
   {metrics, [overview, vhost, node, exchange, queue, connection, channel]}]}].

I get an error from ExAws :

** Started from <0.613.0>
** When function  == fun Elixir.RabbitMQ.CloudWatchExporter.Exporter:run/2
**      arguments == [[{period,60},{collectors,[overview,vhost,node,exchange,queue,connection,channel]},{namespace,<<"RabbitMQ">>}],[{access_key_id,<<"***">>},{secret_access_key,<<"***">>},{region,<<"eu-west-3">>}]]
** Reason for termination ==
** {#{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>MissingParameter</Code>\\n    <Message>The parameter MetricData.member.7.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.8.Dimensions.member.3.Value is required.</Message>\\n  </Error>\\n  <RequestId>c50b609e-76eb-11e9-a3e2-d57f24e0c0f8</RequestId>\\n</ErrorResponse>\\n\", headers: [{\"x-amzn-RequestId\", \"c50b609e-76eb-11e9-a3e2-d57f24e0c0f8\"}, {\"Content-Type\", \"text/xml\"}, {\"Content-Length\", \"399\"}, {\"Date\", \"Wed, 15 May 2019 08:31:00 GMT\"}, {\"Connection\", \"close\"}], status_code: 400}}}\n">>},[{'Elixir.ExAws','request!',2,[{file,"lib/ex_aws.ex"},{line,66}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.RabbitMQ.CloudWatchExporter.Exporter',run,2,[{file,"lib/exporter.ex"},{line,60}]},{'Elixir.Task.Supervised',invoke_mfa,2,[{file,"lib/task/supervised.ex"},{line,90}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
2019-05-15 08:31:00.691 [error] <0.676.0> CRASH REPORT Process <0.676.0> with 0 neighbours crashed with reason: #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>MissingParameter</Code>\\n    <Message>The parameter MetricData.member.7.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.8.Dimensions.member.3.Value is required.</Message>\\n  </Error>\\n  <RequestId>c50b609e-76eb-11e9-...">>} in 'Elixir.ExAws':'request!'/2 line 66
2019-05-15 08:31:00.692 [error] <0.613.0> Supervisor {<0.613.0>,'Elixir.Supervisor.Default'} had child 'Elixir.RabbitMQ.CloudWatchExporter.Exporter' started with 'Elixir.RabbitMQ.CloudWatchExporter.Exporter':start_link([]) at <0.676.0> exit with reason #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>MissingParameter</Code>\\n    <Message>The parameter MetricData.member.7.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.8.Dimensions.member.3.Value is required.</Message>\\n  </Error>\\n  <RequestId>c50b609e-76eb-11e9-...">>} in 'Elixir.ExAws':'request!'/2 line 66 in context child_terminated
2019-05-15 08:32:00.814 [error] <0.679.0> ** Task <0.679.0> terminating

For the record, I've only one rabbitmq server, I've removed the node, connection, channel metrics and it worked

noxdafox commented 5 years ago

Unfortunately, I cannot reproduce this issue but I have some suspect.

My guess is, for some reason, your broker does not report few value-limits in the node statistics.

I have an experimental build which I am attaching, could you please test it and confirm it solves your issue?

rabbitmq_cloudwatch_exporter-0.1.0.zip

FlyersWeb commented 5 years ago

Hi, unfortunately I tried the experimental build but still have same issue.

FYI when I remove the node metric I get the following error :

2019-05-16 07:44:43.002 [error] <0.613.0> Supervisor {<0.613.0>,'Elixir.Supervisor.Default'} had child 'Elixir.RabbitMQ.CloudWatchExporter.Exporter' started with 'Elixir.RabbitMQ.CloudWatchExporter.Exporter':start_link([]) at <0.614.0> exit with reason #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>MissingParameter</Code>\\n    <Message>The parameter MetricData.member.13.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.14.Dimensions.member.3.Value is required.</Message>\\n  </Error>\\n  <RequestId>77d25b18-77ae-11e...">>} in 'Elixir.ExAws':'request!'/2 line 66 in context child_terminated

And it seems that when I remove the exchange metric there is no more issue.

noxdafox commented 5 years ago

Can you please try with the node metric and without the exchange one? I want to first make sure I pinpointed the actual issue, then I will propagate the fix to all metrics.

FlyersWeb commented 5 years ago

it does work with the node metric and without the exchange metric

noxdafox commented 5 years ago

Then my suspects are correct. I will ensure all the dimensions are forwarded only if available.

Just as a curiosity, what version of RabbitMQ and Erlang are you using?

FlyersWeb commented 5 years ago
$ rabbitmqctl status
Status of node rabbitmq@localhost ...
[{pid,12820},
 {running_applications,
     [{rabbitmq_cloudwatch_exporter,"rabbitmq_cloudwatch_exporter","0.1.0"},
      {hackney,"simple HTTP client","1.15.1"},
      {certifi,"CA bundle adapted from Mozilla by https://certifi.io","2.5.1"},
      {poison,"An incredibly fast, pure Elixir JSON library","3.1.0"},
      {ex_aws_cloudwatch,
          "Cloudwatch module for https://github.com/ex-aws/ex_aws","2.0.4"},
      {ex_aws,"Generic AWS client","2.1.0"},
      {logger,"logger","1.8.1"},
      {ssl_verify_fun,"SSL verification functions for Erlang\n","1.1.4"},
      {elixir,"elixir","1.8.1"},
      {idna,"A pure Erlang IDNA implementation","6.0.0"},
      {mimerl,"Library to handle mimetypes","1.2.0"},
      {metrics,"A generic interface to different metrics systems in Erlang.",
          "1.0.1"},
      {unicode_util_compat,
          "unicode_util compatibility library for Erlang < 20","0.4.1"},
      {rabbitmq_management,"RabbitMQ Management Console","3.7.8"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.8"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.8"},
      {rabbit,"RabbitMQ","3.7.8"},
      {cowboy,"Small, fast, modern HTTP server.","2.2.2"},
      {amqp_client,"RabbitMQ AMQP Client","3.7.8"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.7.8"},
      {ranch_proxy_protocol,"Ranch Proxy Protocol Transport","1.5.0"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.5.0"},
      {ssl,"Erlang/OTP SSL application","9.0"},
      {public_key,"Public key infrastructure","1.6"},
      {os_mon,"CPO  CXC 138 46","2.4.5"},
      {asn1,"The Erlang ASN1 compiler version 5.0.6","5.0.6"},
      {inets,"INETS  CXC 138 49","7.0"},
      {xmerl,"XML parser","1.3.17"},
      {recon,"Diagnostic tools for production use","2.3.2"},
      {cowlib,"Support library for manipulating Web protocols.","2.1.0"},
      {crypto,"CRYPTO","4.3"},
      {jsx,"a streaming, evented json parsing toolkit","2.8.2"},
      {mnesia,"MNESIA  CXC 138 12","4.15.4"},
      {lager,"Erlang logging framework","3.6.3"},
      {goldrush,"Erlang event stream processor","0.1.9"},
      {compiler,"ERTS  CXC 138 10","7.2"},
      {syntax_tools,"Syntax tools","2.1.5"},
      {syslog,"An RFC 3164 and RFC 5424 compliant logging framework.","3.4.3"},
      {sasl,"SASL  CXC 138 11","3.2"},
      {stdlib,"ERTS  CXC 138 10","3.5"},
      {kernel,"ERTS  CXC 138 10","6.0"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:64] [hipe]\n"}
...
noxdafox commented 5 years ago

I tried your Erlang and RabbitMQ versions and yet I could not reproduce the issue.

Does error message repeats periodically or sporadically? If your collection period is 60 seconds, does it always crash every 60 seconds or only sometimes?

What is odd is that the message:

The parameter MetricData.member.13.Dimensions.member.3.Value is required.

Indicates that the 13th metric entry of the exchanges list is missing the third dimension parameter. Which means the plugin could not read the VHost name of the given exchange. This is quite odd!

Nevertheless, I went through all the dimension getters ensuring they return undefined whenever they cannot retrieve the information. This should prevent the collector from crashing, yet I am not fully comfortable with this solution.

Could you please try this new build and tell me if it still has issues?

rabbitmq_cloudwatch_exporter-0.1.0.zip

FlyersWeb commented 5 years ago

Hi,

I just had the opportunity to test your fix, but it didn't seems to work, when I add the exchange metric, I still have the issue :

2019-06-13 12:43:48.765 [error] <0.667.0> CRASH REPORT Process <0.667.0> with 0 neighbours crashed with reason: #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>MissingParameter</Code>\\n    <Message>The parameter MetricData.member.7.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.8.Dimensions.member.3.Value is required.</Message>\\n  </Error>\\n  <RequestId>e3e41a38-8dd8-11e9-...">>} in 'Elixir.ExAws':'request!'/2 line 66

I've fixed it by not publishing the exchange metric as it does not provide any relevant data for me.

noxdafox commented 5 years ago

Could you please paste me the result of the following command?

curl -u <user>:<password> http://<host>:<port>/api/exchanges

Where <user> and <password> are your broker ones. While <host> and <port> are the location of your broker management console (localhost and 15672 if you are running it locally).

Note that the command will show your exchanges layout so if it's a production system make sure you can provide such information.

FlyersWeb commented 5 years ago

There is the output (it is not a production environment)

[{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"message_stats":{"publish_in":122,"publish_in_details":{"rate":0.0},"publish_out":122,"publish_out_details":{"rate":0.0}},"name":"","type":"direct","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.direct","type":"direct","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.fanout","type":"fanout","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.headers","type":"headers","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.match","type":"headers","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":true,"name":"amq.rabbitmq.trace","type":"topic","user_who_performed_action":"rmq-internal","vhost":"/"},{"arguments":{},"auto_delete":false,"durable":true,"internal":false,"name":"amq.topic","type":"topic","user_who_performed_action":"rmq-internal","vhost":"/"}]%
noxdafox commented 5 years ago

The message above belongs to issue #5 and not this one. Please move it there.

A fix is on the way and I will soon publish a new release.

KursLabIgor commented 4 years ago

any updates? i want to try it. but not sure if it works))

noxdafox commented 4 years ago

Problem resolved in 0.3.1. See issue #7 for details about the resolution.