Open NeckBeardPrince opened 9 years ago
@NeckBeardPrince We will look into this. This is heavily dependent on your env and machine. Could you provide some more details.
Thanks!
I have the same issue, appears to be something with aws sdk bu I'm not sure.
@NeckBeardPrince @luisdalves This has been confirmed by others as well and we are open to suggestions but that seems like a logical starting place @luisdalves
Sorry I just saw, what kinda of details would you like?
BUMP
I'm looking into this now for my own environment. It's a two part problem:
1) Profile the plugin runs to see where the bottleneck is. More than likely this is an issue with the aws-sdk, which we can't do much about 2) Make sure each plugin can poll multiple resources in the same run so you don't need multiple plugin runs to perform you checks or metrics polling. I've opened a ticket for this.
The duration for rds and elb checks is more than 10 seconds on average in our setup. We have a check for each instance this means our process list ist full of elb and rds checks. I suppose omitting the instance parameter (-r or -n) would speed up the process. But in this case we do not have a history for each instance and we cannot assign them to customers/projects.
Maybe we should create a check like check-rds-multi which would check all rds events and send the results to the local socket interface. This would allow us to see individual check results and reduce the amount of queries to aws apis.
Unless someone is going to profile each plugin and the aws-sdk and find where the slowness comes from I suggest that this be the focus: https://github.com/sensu-plugins/sensu-plugins-aws/issues/27
I confirm high cpu-load with aws-plugins. I have now about 35 checks against aws. Mostly cloudwatch plugin, but there are several others like ec2-health and elbv2 monitoring. All checks are running by one agent. This is a c4.large instance. Instance is constantly running with ~80% CPU-load. That's «a little» annoying, because I need to add about 50 other checks and I don't feel good about that. Here is my CPU-graph: https://yadi.sk/i/-xx1omga3LwbPh UPD: Most checks have 60sec period. Last spike — I've added 19x300sec period checks for certain cloudwatch metric. Looks like sensu keeps plugin in dummy cycle and takes data only at desired time period, or I don't know what.
Every check on it's own will eat up 91%+ of CPU. Even when you pass nothing to it, just "check-rds.rb" and it spikes.