Closed lloyd closed 11 years ago
proposal: let's instrument with statsd and run a little statsd daemon with the aws cloudwatch backend.
This let's us instrument the daemon using statsd, and fully leverage cloudwatch graphs (or geckoboard), and will let us set alarms.
The other benefit is that we don't have to stand up more instances per DC.
Goal: easiest possible way to get statsd metrics routed to cloudwatch without any new instances.
thoughts on this approach?
+1
It's consistent with browserid and BigTent.
Do we already have a statsd server already running for browserid/BigTent? If so, and we can use those shared resources then +1 from me too.
@mostlygeek - I'm proposing we run our own statsd server on each instance and connect it to cloudwatch so we don't have to stand up another instance per colo. geckoboard should be just fine w/ cloudwatch alerts too.
thoughts?
Like this? https://npmjs.org/package/aws-cloudwatch-statsd-backend
FYI I'm not super familiar w/ statsd but it looks dead simple.
Yes, exactly that.
I can submodule in statsd and get it all configured with the backend and tests if your down.
Ops difference is we run two processes instead of one.
-- lloyd (thumb-typing)
On Apr 25, 2013, at 11:56 PM, Benson Wong notifications@github.com wrote:
Like this? https://npmjs.org/package/aws-cloudwatch-statsd-backend
FYI I'm not super familiar w/ statsd but it looks dead simple.
— Reply to this email directly or view it on GitHub.
No need to sub-module. I'm going to automate AMI creation anyways. Just add an extra step to run statsd + extra module, and wire all of it together.
Though, if you want to make a project that pulls in statsd, the cloudwatch backend module and gets most of the pieces in place (except AWS secrets / other config stuff) that'll go a long way too :)
assigned to @mostlygeek - he's working on the statsd -> cloudwatch bridge. Initial application level statsd monitors are implemented and documented in docs/statsd_design.md
.
@mostlygeek back at you. So the metric names are now appearing in cloud watch, but values are not.
I entered an incorrect password 10x, after doing so I looked at this graph:
the mozillaidp.ldap.auth.wrong_password metric came into existence, however it has zero values.
at this point all stats are being routed into cloudwatch and look awesome. well done @mostlygeek.
Taking ownership of this bug to set up two geckoboard dashboards for visualizing stage and prod environments.
current dashboard status: https://metrics.librato.com/share/dashboards/hukavsvf
pretty good high level view... Comfortable that we can have a great realtimeish view up in time for launch with this approach...
@mostlygeek are you confident we'll get a page if a node is broken? if so, let's close.
complete.
We need at time of launch a single dashboard that gives us an understanding of application health. This should include application specific stats.
The tradition has been to use statsd, but as long as the goal is achieved with a minimum of new infrastructure to support, I'm open to ideas.