sensu-plugins / sensu-plugins-aws

This plugin provides native AWS instrumentation for monitoring and metrics collection, including: health and metrics for various AWS services, such as EC2, RDS, ELB, and more, as well as handlers for EC2, SES, and SNS.
http://sensu-plugins.io
MIT License
81 stars 143 forks source link

Issue with metrics in bin/check-sqs-messages.rb #380

Open yoliinyk opened 4 years ago

yoliinyk commented 4 years ago

Hello.

I'm use bin/check-sqs-messages.rb and I would like monitoring my Q with metric name: ApproximateAgeOfOldestMessage.

I run check-sqs-messages.rb with next options: /opt/sensu/embedded/bin/ruby check-sqs-messages.rb -r us-west-2 -q sqs_test_events_c4-dev_dlq -m ApproximateAgeOfOldestMessage -c 100 and I see output: SQSMsgs OK: all queue(s): ["sqs_test_events_c4-dev_dlq"] are OK , but really I have 350 000 messages in this Q metric ApproximateAgeOfOldestMessage

When I run without option -m ApproximateAgeOfOldestMessage - it is work correct:

/opt/sensu/embedded/bin/ruby check-sqs-messages.rb -r us-west-2 -q sqs_test_events_c4-dev_dlq -c 100 
SQSMsgs CRITICAL: 61223 message(s) in sqs_test_events_c4-dev_dlq

@majormoses can you please help investigate/fix this? Thank you.

yoliinyk commented 4 years ago

Any update @majormoses

yoliinyk commented 4 years ago

Any update @majormoses ?

yoliinyk commented 4 years ago

any update @majormoses ?

majormoses commented 4 years ago

@yoliinyk hey I don't work for sensu, I maintain 500+ repositories across multiple orgs on my "free time" and as such I offer no response SLO/SLA.

To help me investigate I could use a bit more information:

jnmullen commented 4 years ago

I've just been trying to get the very same thing to work and finding it didn't.

It won't ever work as the SDK doesn't return an attribute called ApproximateAgeOfOldestMessage which is what the code will be looking for the in response.

The supported attribute names from the SDK are these :

attribute_names: ["All"], # accepts All, Policy, VisibilityTimeout, MaximumMessageSize, MessageRetentionPeriod, ApproximateNumberOfMessages, ApproximateNumberOfMessagesNotVisible, CreatedTimestamp, LastModifiedTimestamp, QueueArn, ApproximateNumberOfMessagesDelayed, DelaySeconds, ReceiveMessageWaitTimeSeconds, RedrivePolicy, FifoQueue, ContentBasedDeduplication, KmsMasterKeyId, KmsDataKeyReusePeriodSeconds

Which is taken from this page : https://docs.aws.amazon.com/sdk-for-ruby/v2/api/Aws/SQS/Client.html#get_queue_attributes-instance_method

majormoses commented 4 years ago

I've just been trying to get the very same thing to work and finding it didn't.

It won't ever work as the SDK doesn't return an attribute called ApproximateAgeOfOldestMessage which is what the code will be looking for the in response.

The supported attribute names from the SDK are these :

attribute_names: ["All"], # accepts All, Policy, VisibilityTimeout, MaximumMessageSize, MessageRetentionPeriod, ApproximateNumberOfMessages, ApproximateNumberOfMessagesNotVisible, CreatedTimestamp, LastModifiedTimestamp, QueueArn, ApproximateNumberOfMessagesDelayed, DelaySeconds, ReceiveMessageWaitTimeSeconds, RedrivePolicy, FifoQueue, ContentBasedDeduplication, KmsMasterKeyId, KmsDataKeyReusePeriodSeconds

Which is taken from this page : https://docs.aws.amazon.com/sdk-for-ruby/v2/api/Aws/SQS/Client.html#get_queue_attributes-instance_method

Odd I know I wrote, tested, and consumed it; that was a number of years ago so I am trying to fill in the memory and/or knowledge gaps. I will also try to reach out to someone at my old employer and see if they have noticed the same (assuming they are still using this). To rule out other issues can you try going to 4.0.0 when the change was introduced and see the same result? https://github.com/sensu-plugins/sensu-plugins-aws/blob/master/CHANGELOG.md#400---2016-12-27

majormoses commented 4 years ago

I am able to reproduce it, I am going to put in a hotfix that will surface false positives like this. I am still not sure why the api is not returning that value though.

majormoses commented 4 years ago

I have opened #381 to solve the first problem (false positives and useful debug info) and will set us up for figuring out a proper fix for this. I suspect the reason is that the metric was exposed in cloudwatch and the console but not the particular call in question.

majormoses commented 4 years ago

I am gonna see about getting CR on that and if needed I will self merge it.

A bit of an update: I have started some discussions around this inconsistent behavior I got an initial response back from the SDK team but I am pushing back on their assessment of it. Even if we don't get an sdk change I will still push for some documentation improvements as their explanation was inconsistent with some of the already exposed metrics.

yoliinyk commented 4 years ago

Awesome, thank you.

majormoses commented 4 years ago

Sorry this fell off my radar, I am gonna try some time next week and dig through my emails and see what/if I ever got a good response back from aws. Please ping me if I don't post something by mid next week.

yoliinyk commented 4 years ago

@majormoses any update?

yoliinyk commented 4 years ago

@majormoses any update?

majormoses commented 4 years ago

Sorry I was heads down last week, I will reach out to our AWS rep and see if I can get an update on their end. I am not currently using SQS for anything at my org so the part 2 will likely need someone to jump in since there is no current clear path forward.

james-mullen-itv commented 3 years ago

So the change made in https://github.com/sensu-plugins/sensu-plugins-aws/pull/381 breaks all my checks which are checking for number of messages in an SQS DLQ.

To me it looks like this bit of code is wrong :

https://github.com/sensu-plugins/sensu-plugins-aws/blob/aca23830f0498a8f3eb973ccf781f9c0163c4f33/bin/check-sqs-messages.rb#L122

The key method expects the value to be passed in to return the key. I assume what you want to use in this situation is has_key? instead to check is the key within the attributes hash?