Open tmcgilchrist opened 1 year ago
This is a good list to trawl through: https://github.com/ivbeg/awesome-status-pages. We could host it separately of the Scaleway and Cambridge Computer Lab infrastructure on Mythic Beasts, if not using one of the hosted options.
I'm keen the style of something like https://status.gitlab.com that has space for the various public facing pieces plus the sub-systems that make everything work.
We are starting with a bottom up approach of building monitoring pages for each of:
Then we can choose something independently hosted to feed those checks into. This is just an update to say we are working towards this, with work still to do. :-)
This all sounds good. Might you please coordinate with @mtelvers on his observer.ocamllabs.io prototype mentioned in https://github.com/ocaml/infrastructure/issues/42#issuecomment-1554623791? That looks like a good start, but I suspect its database will grow quite quickly as it's storing the results of ping rebuilds in each ocurrent node.
Also as @hannesm mentions in #48, we need a check for the freshness of opam.ocaml.org. I suspect that would be better done as a email/Matrix message from a build failure in the deployer pipeline rather than a healthcheck though, since otherwise it'll be difficult to distinguish between "no pushes to opam-repo recently" and "not a fresh archive on opam.ocaml.org".
Migrating issue from the wiki to allow discussion.
What should be on a status.ocaml.org page?
At a minimum we should have operational status of:
Deployer https://deploy.ci.ocaml.org
What are the options for hosting? Independent of the current infrastructure.