Open tombh opened 7 years ago
The status server here provides an undocumented /healthcheck JSON endpoint that is only used by oam-browser. Health is defined by pings to the oam-catalog.
I may be wrong but I think health is also defined by New Relic, right?
Ultimately the core function of a status page is to provide independent verification of a service's status. There already exist free services that do this, I would recommend: https://uptimerobot.com/ which also provides an API so we can report statuses from other sites.
Can we still have a status page from these services that sits at a custom domain?
Oh yes, you're right, pings only provide binary up/down. Whereas the newRelicGetHelath()
call here does indeed offer something slightly more fine-grained. So yes, that means that an orange status can still be loaded within the oam-browser. So my argument is not so strong. Though I would argue that this repo is still very much overkill for what it achieves. It still doesn't provide hosting on alternative infrastructure nor region redundancy. And the curiosity of having a completely separate healthcheck through HTML from the JSON endpoint still needs to be addressed.
And compared to the out-of-the-box functionality of https://uptimerobot.com/ including free custom domains, it's hard to justify the technical debt here.
Here's a working status page using uptimerobot.com https://stats.uptimerobot.com/WPByvFZ4Y It checks both the website and Catalog API every 5 minutes.
Please correct me if I'm wrong but this repo only serves 2 conflicting purposes, both of which are problematically implemented.
The status server here provides an undocumented
/healthcheck
JSON endpoint that is only used by oam-browser. Health is defined by pings to the oam-catalog. If the Catalog API is down then the browser frontend will not work anyway!The status website here directly queries via client AJAX the
/analytics
endpoint on the Catalog API. This uses the web client's internet connection which makes the status definition largely worthless - it is far more likely that the client's connection is to blame for a 'poor' healthcheck.Ultimately the core function of a status page is to provide independent verification of a service's status. There already exist free services that do this, I would recommend: https://uptimerobot.com/ which also provides an API so we can report statuses from other sites. What's more such services already implement multiple redundancy, running on different platforms from many regions across the world.