Closed curry-lee closed 4 years ago
Hello,
a typical mistake is to forget to configure the region.
CW metrics are published within a region and visibile only in that region. If you did not configure the region, metrics will end up in the default one: us-east-1
.
If you think the region is correct, please provide me the configuration values you used to configure the plugin (remember to blank the secrets).
Hi:
Thanks for reply. We did check out the region for default value "us-east-1", and also no metric appear in console, the RabbitMQ config described as follows:
{rabbitmq_cloudwatch_exporter, [
{metrics, [overview, vhost, node, exchange, queue, connection, channel]}
], [
{aws, [
{access_key_id, "xx"},
{secret_access_key, "xx"},
{region, "ap-northeast-1"}
]}
]}
And also the plugin list as follows:
[ ] rabbitmq_amqp1_0 3.7.16
[ ] rabbitmq_auth_backend_cache 3.7.16
[ ] rabbitmq_auth_backend_http 3.7.16
[ ] rabbitmq_auth_backend_ldap 3.7.16
[ ] rabbitmq_auth_mechanism_ssl 3.7.16
[E*] rabbitmq_cloudwatch_exporter 0.2.1
[ ] rabbitmq_consistent_hash_exchange 3.7.16
[ ] rabbitmq_event_exchange 3.7.16
[ ] rabbitmq_federation 3.7.16
[ ] rabbitmq_federation_management 3.7.16
[ ] rabbitmq_jms_topic_exchange 3.7.16
[E*] rabbitmq_management 3.7.16
[e*] rabbitmq_management_agent 3.7.16
Thanks
The plugin follows RMQ standard logging. If there's any issue, you should be able to see it in the logs.
If you run RMQ under Linux with systemd, you might want to check there as well with the command:
$ sudo journalctl -u rabbitmq-server.service
Hi:
Sorry, it is my fault, I did restart the app but did not restart the rabbitmq-server
service, and it works right, thank you for your help.
However, some error message list as follows:
{#{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.1.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.2.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>e2c5137c-a50c-49f9-9318-9e2eb66b976c</RequestId>\\n</ErrorResponse>\\n\", headers: [{\"x-amzn-RequestId\", \"e2c5137c-a50c-49f9-9318-9e2eb66b976c\"}, {\"Content-Type\", \"text/xml\"}, {\"Content-Length\", \"399\"}, {\"Date\", \"Thu, 01 Aug 2019 09:55:12 GMT\"}, {\"Connection\", \"close\"}], status_code: 400}}}\n">>},[{'Elixir.ExAws','request!',2,[{file,"lib/ex_aws.ex"},{line,66}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.RabbitMQCloudWatchExporter.Exporter',handle_info,2,[{file,"lib/rabbitmq_cloudwatch_exporter/exporter.ex"},{line,60}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
2019-08-01 09:55:12.873 [error] <0.1009.0> CRASH REPORT Process <0.1009.0> with 1 neighbours crashed with reason: #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.1.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.2.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>e2c5137c-a50c-49f9-...">>} in 'Elixir.ExAws':'request!'/2 line 66
2019-08-01 09:55:12.874 [error] <0.276.0> Supervisor 'Elixir.Singleton.Supervisor' had child 'Elixir.Singleton.Manager' started with 'Elixir.Singleton.Manager':start_link('Elixir.RabbitMQCloudWatchExporter.Exporter', [], rabbitmq_cloudwatch_exporter, 'singleton_0bpF3SXqPZTQc80tAf+lu7E5kqQ') at <0.1008.0> exit with reason #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.1.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.2.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>e2c5137c-a50c-49f9-...">>} in 'Elixir.ExAws':'request!'/2 line 66 in context child_terminated
This is odd, it complains a metric point is missing but the logic should already handle missing data points.
Could you please try this build and show me the output? It will log all metrics before publishing them helping me out understanding which metric gets corrupted and why.
You need Erlang 21 and elixir 1.8. Make sure your log level is set at least to info
. You will see the data point after the Publishing metrics to CW:
line. Please print that output so I can see what's wrong with your broker.
@noxdafox Here's some logs from it breaking. This ONLY happens when it attempts to export exchange metrics.
To reproduce:
2019-12-16 14:40:13.907 [info] <0.1501.0> Publishing metrics to CW: [[{metric_name,<<"MessagesReady">>},{unit,<<"Count">>},{value,101},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"MessagesUnacknowledged">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"Messages">>},{unit,<<"Count">>},{value,101},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"Publish">>},{unit,<<"Count">>},{value,1},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"Confirm">>},{unit,<<"Count">>},{value,1},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"ReturnUnroutable">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"ClusterOverview">>}]}],[{metric_name,<<"Memory">>},{unit,<<"Bytes">>},{value,36916},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Consumers">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesReady">>},{unit,<<"Count">>},{value,78},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesUnacknowledged">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Messages">>},{unit,<<"Count">>},{value,78},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Publish">>},{unit,<<"Count">>},{value,1},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"cw-export">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Memory">>},{unit,<<"Bytes">>},{value,13724},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"game_topic">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Consumers">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"game_topic">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesReady">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"game_topic">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesUnacknowledged">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"game_topic">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Messages">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"game_topic">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Memory">>},{unit,<<"Bytes">>},{value,13724},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"testqueue">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Consumers">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"testqueue">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesReady">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"testqueue">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"MessagesUnacknowledged">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit@seth-laptop">>},{<<"Metric">>,<<"Queue">>},{<<"Queue">>,<<"testqueue">>},{<<"VHost">>,<<"/">>}]}],[{metric_name,<<"Messages">>},{unit,<<"Count">>},{value,0},{dimensions,[{<<"Cluster">>,<<"rabbit...">>},...]}],...]
2019-12-16 14:40:14.366 [error] <0.1501.0> ** Generic server rabbitmq_cloudwatch_exporter terminating
** Last message in was export_metrics
** When Server state == [[{period,15},{collectors,[overview,queue,node,exchange]},{namespace,<<"RabbitMQ">>}],[{secret_access_key,<<"">>},{access_key_id,<<"">>},{region,<<"us-east-1">>}]]
** Reason for termination ==
** {#{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.12.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.13.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>65a8c6d4-a5e2-42ea-bafc-3d52d8a6aee3</RequestId>\\n</ErrorResponse>\\n\", headers: [{\"x-amzn-RequestId\", \"65a8c6d4-a5e2-42ea-bafc-3d52d8a6aee3\"}, {\"Content-Type\", \"text/xml\"}, {\"Content-Length\", \"401\"}, {\"Date\", \"Mon, 16 Dec 2019 21:40:13 GMT\"}, {\"Connection\", \"close\"}], status_code: 400}}}\n">>},[{'Elixir.ExAws','request!',2,[{file,"lib/ex_aws.ex"},{line,66}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.Enum','-map/2-lists^map/1-0-',2,[{file,"lib/enum.ex"},{line,1327}]},{'Elixir.RabbitMQCloudWatchExporter.Exporter',handle_info,2,[{file,"lib/rabbitmq_cloudwatch_exporter/exporter.ex"},{line,61}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
2019-12-16 14:40:14.367 [error] <0.1501.0> CRASH REPORT Process <0.1501.0> with 1 neighbours crashed with reason: #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.12.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.13.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>65a8c6d4-a5e2-42e...">>} in 'Elixir.ExAws':'request!'/2 line 66
2019-12-16 14:40:14.367 [error] <0.343.0> Supervisor 'Elixir.Singleton.Supervisor' had child 'Elixir.Singleton.Manager' started with 'Elixir.Singleton.Manager':start_link('Elixir.RabbitMQCloudWatchExporter.Exporter', [], rabbitmq_cloudwatch_exporter, 'singleton_0bpF3SXqPZTQc80tAf+lu7E5kqQ') at <0.1500.0> exit with reason #{'__exception__' => true,'__struct__' => 'Elixir.ExAws.Error',message => <<"ExAws Request Error!\n\n{:error, {:http_error, 400, %{body: \"<ErrorResponse xmlns=\\\"http://monitoring.amazonaws.com/doc/2010-08-01/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>MissingParameter</Code>\\n <Message>The parameter MetricData.member.12.Dimensions.member.3.Value is required.\\nThe parameter MetricData.member.13.Dimensions.member.3.Value is required.</Message>\\n </Error>\\n <RequestId>65a8c6d4-a5e2-42e...">>} in 'Elixir.ExAws':'request!'/2 line 66 in context child_terminated
Thanks for providing the above. Does it fail consistently on MetricData.member.12.Dimensions.member.3.Value
or is it failing on different values every time?
Looking at the metric itself it seems to miss the value for the queue cw-export
which actually is present in the data.
This kind of suggests me a problem with truncation or internal batching the API might be doing.
Glancing up through the logs I'm only seeing it fail on MetricData.member.13.Dimensions.member.3.Value
One more piece of info I just learned that might help. This only happens when I publish on the AMQP default exchange. Publishing on other exchanges works fine. Perhaps AWS doesn't like the name of that default exchange (I think it's "/") perhaps you need to switch it to %2F
before you send it over
I tested the output you provided and it worked. Issue is it's truncated and is probably missing the offending data points (no exchange related metrics where logged).
I built a new one hopefully avoiding the truncation issue. Could you please try this out?
@noxdafox Sorry for the delay. I'm still getting the error using the build you included in your last comment after publishing a message to a default exchange when I have exchange metrics enabled.
The last build is not trying to address the error but to better print the cause. The previous output you provided me is truncated so I slightly modified the routine to avoid the truncation.
Could you please paste me the new output you see on the log?
You should now see this message:
Publishing CW Metrics chunk: ...
Followed by the error traceback. Could you please provide me the last message above before seeing the error traceback?
Oh gotcha sorry for the misunderstanding. I attached the logs. I noticed there is an exchange key with no value in there. cloudwatch_exporter_logs.txt
There we go! The default exchange has an empty string as name which is invalid as CW metric dimension. This went unnoticed as RMQ reveals exchange metrics only for exchanges which are being in use.
That's the reason why I was struggling to reproduce it. I had to trigger some activity with the default exchange in order to observe the issue.
As a solution I opted for a _
as default exchange name as it seemed the less prone to name collision (default
might have been used as exchange name by somebody).
The new logic also avoids crashing but rather continues publishing the rest of the metrics and logs the problematic group of metrics.
Hi:
I had install and also enable this plugin, and setup credential access of AWS. However, CloudWatch console did not received any metrics from RabbitMQ, and also there are no info or error log output from RabbitMQ logger, do you have any idea about how to process this?
Running two nodes of cluster {rabbitmq_cloudwatch_exporter,"rabbitmq_cloudwatch_exporter","0.2.1"} {rabbit,"RabbitMQ","3.7.16"}, {erlang_version, "Erlang/OTP 21 [erts-10.3.5.1] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:64] [hipe]\n"}
Thanks