mozilla / identity-ops

Tools and Chef cookbooks used by Mozilla Services Operations to provision and manage Persona
Other
24 stars 12 forks source link

Establish monitoring on AWS "Events" to detect when instances are running on degraded hardware #98

Open gene1wood opened 11 years ago

gene1wood commented 11 years ago

https://console.aws.amazon.com/ec2/home?region=us-west-2#s=Events

Currently we aren't notified by monitoring alerts of these events being created. We need to so we can kill the machine.

whd commented 11 years ago

A first pass for this check here

gene1wood commented 11 years ago

@whd cool, looks good. I think I may want to tweak what you've got slightly so as to make this a check that applies to a specific host such that when an instances is on degraded hardware the instance that alerts is the affected instance. Anyhow, should be an easy extension. I'll use this caching decorator ( https://github.com/mozilla/identity-ops/blob/master/chef/cookbooks/persona-monitor/files/default/usr/local/nagios/libexec/check_instance_elb_membership#L15-L38 ) to avoid increasing the number of AWS API calls by orders of magnitude.

@whd would you submit a PR for your branch of this and I'll merge and work from that?

whd commented 11 years ago

@gene1wood to get the per-instance behavior you want you can probably just plug in a particular instance id to the invocation of get_all_instance_status. You'll need to compute the instance id from whatever unique identifier you have (probably hostname).