usgo / online-ratings

AGA Online Ratings protocol and implementation
MIT License
23 stars 12 forks source link

Add Montoring #32

Closed artasparks closed 8 years ago

artasparks commented 8 years ago

Before we're truly production-worthy, we need a monitoring solution. https://prometheus.io/ is one solution suggested by @amj.

artasparks commented 8 years ago

@amj / @brilee -- Were you thinking we'd have a separate monitoring server? Or just a separate monitoring container running on the same machine? How much resources do we have? Monitoring servers can sometimes be resource hogs, if you let them (I think prometheus.io takes 3gb of ram by default).

amj commented 8 years ago

I figured it'd be a container on the same machine for starters, but the hope is that having a container means you don't have to worry about moving it to a new host if needed. (I don't have my heart set on prometheus)

brilee commented 8 years ago

I've added an email handler for 500s - it'll just email all of config.ADMINS whenever there's a 500. @vash3g also suggested an external ping-type service, which we can set up after a prod box is up.