Monitoring - Githubissues

mozilla / vinz-clortho

INACTIVE - http://mzl.la/ghe-archive - BrowserID Keymaster for LDAP enabled Identity Providers

16 stars 21 forks source link

Monitoring #15

Closed lloyd closed 11 years ago

lloyd commented 11 years ago

We need at time of launch a single dashboard that gives us an understanding of application health. This should include application specific stats.

The tradition has been to use statsd, but as long as the goal is achieved with a minimum of new infrastructure to support, I'm open to ideas.

lloyd commented 11 years ago

proposal: let's instrument with statsd and run a little statsd daemon with the aws cloudwatch backend.

This let's us instrument the daemon using statsd, and fully leverage cloudwatch graphs (or geckoboard), and will let us set alarms.

The other benefit is that we don't have to stand up more instances per DC.

Goal: easiest possible way to get statsd metrics routed to cloudwatch without any new instances.

thoughts on this approach?

ozten commented 11 years ago

It's consistent with browserid and BigTent.

mostlygeek commented 11 years ago

Do we already have a statsd server already running for browserid/BigTent? If so, and we can use those shared resources then +1 from me too.

lloyd commented 11 years ago

@mostlygeek - I'm proposing we run our own statsd server on each instance and connect it to cloudwatch so we don't have to stand up another instance per colo. geckoboard should be just fine w/ cloudwatch alerts too.

thoughts?

mostlygeek commented 11 years ago

Like this? https://npmjs.org/package/aws-cloudwatch-statsd-backend

FYI I'm not super familiar w/ statsd but it looks dead simple.

lloyd commented 11 years ago

Yes, exactly that.

I can submodule in statsd and get it all configured with the backend and tests if your down.

Ops difference is we run two processes instead of one.

-- lloyd (thumb-typing)

On Apr 25, 2013, at 11:56 PM, Benson Wong notifications@github.com wrote:

Like this? https://npmjs.org/package/aws-cloudwatch-statsd-backend

FYI I'm not super familiar w/ statsd but it looks dead simple.

— Reply to this email directly or view it on GitHub.

mostlygeek commented 11 years ago

No need to sub-module. I'm going to automate AMI creation anyways. Just add an extra step to run statsd + extra module, and wire all of it together.

Though, if you want to make a project that pulls in statsd, the cloudwatch backend module and gets most of the pieces in place (except AWS secrets / other config stuff) that'll go a long way too :)

lloyd commented 11 years ago

assigned to @mostlygeek - he's working on the statsd -> cloudwatch bridge. Initial application level statsd monitors are implemented and documented in docs/statsd_design.md.

lloyd commented 11 years ago

@mostlygeek back at you. So the metric names are now appearing in cloud watch, but values are not.

I entered an incorrect password 10x, after doing so I looked at this graph:

https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#c=CloudWatch&s=Metrics&graph=!PT0!ST1!ET2!NS3!MN4!SS5!PD6!AX7!VAmozillaidp~-PT1H~-PT0H~MozIDP~mozillaidp.ldap.auth.wrong_password~Average~300~Left

the mozillaidp.ldap.auth.wrong_password metric came into existence, however it has zero values.

lloyd commented 11 years ago

at this point all stats are being routed into cloudwatch and look awesome. well done @mostlygeek.

Taking ownership of this bug to set up two geckoboard dashboards for visualizing stage and prod environments.

lloyd commented 11 years ago

current dashboard status: https://metrics.librato.com/share/dashboards/hukavsvf

pretty good high level view... Comfortable that we can have a great realtimeish view up in time for launch with this approach...

screen shot 2013-06-01 at 1 16 39 am

lloyd commented 11 years ago

@mostlygeek are you confident we'll get a page if a node is broken? if so, let's close.

lloyd commented 11 years ago

complete.