hotosm / oam-status

A simple status dashboard for oam-catalog
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Make status dashboard independent of NewRelic #4

Open cgiovando opened 9 years ago

cgiovando commented 9 years ago

As we are migrating everything to AWS, monitoring and status dashboard should be implemented with services offered there.

Ideally the same functions should be abstracted through API, so that implementation could be done in any standalone instance.

jflasher commented 9 years ago

@cgiovando I think New Relic is probably still the right way to go here as it offers app performance monitoring as well as uptime monitoring. It's also already set up as a free third-party service allowing the status piece to be independent of where/how oam-catalog is hosted.

cgiovando commented 9 years ago

It doesn't look like NewRelic is free beyond 24hrs data retention, or am I looking in the wrong place?

http://newrelic.com/application-monitoring/pricing

jflasher commented 9 years ago

Yep, that's right, data retention is only 24hrs, but that doesn't affect any of the uptime monitoring or performance monitoring which is what we're utilizing for the status display.

cgiovando commented 9 years ago

OK, then I may need a bit more details to exactly what the service does other than telling if it's up or down and listing the number of images indexed. How is different from http://www.isitdownrightnow.com/oam-catalog.herokuapp.com.html (for uptime status)?

I'm thinking that for monitoring performance we may want to include history beyond past 24 hours. Something like "how did OAM do during the last HOT activation". Does it make sense?

Maybe something to coordinate with @mojodna and @lossyrob and include a pulse from OAM Server as well.

jflasher commented 9 years ago

There are a couple of different pieces here. New Relic doesn't have anything to do with the listing of the images, that's purely an oam-catalog thing where it queries its own database. In addition to telling us if oam-catalog is up or down, New Relic will alert when the site goes down or performance significantly degrades. When we get to the point where we would be looking to track historical performance metrics, we could capture this the same way we're tracking the number of images in the catalog and do periodic checks with New Relic and save as snapshots, which provides something like below. In this way, we wouldn't necessarily need access to the data past the free 24hr tier.

      "application_summary": {
        "response_time": 121,
        "throughput": 3.2,
        "error_rate": 0,
        "apdex_score": 0.82,
        "instance_count": 1
      }
tombh commented 7 years ago

I guess this is an old conversation, but I just want to point out that as the code stands today, New Relic has nothing to do at all with defining the status of OAM. So I think we should close this issue unless someone thinks otherwise.

Edit: I was kind of wrong. The HTML home page reports a status derived from querying the /analytics endpoint on the Catalog API which doesn't use New Relic. But there is a /healthcheck JSON route here on this repo's server that does query New Relic. Never the less I'm arguing in #13 that this repo is redundant.