ocaml / infrastructure

WIki to hold the information about the machine resources available to OCaml.org
40 stars 10 forks source link

Add status.ocaml.org for monitoring #31

Open tmcgilchrist opened 1 year ago

tmcgilchrist commented 1 year ago

Migrating issue from the wiki to allow discussion.

What should be on a status.ocaml.org page?

At a minimum we should have operational status of:

avsm commented 1 year ago

This is a good list to trawl through: https://github.com/ivbeg/awesome-status-pages. We could host it separately of the Scaleway and Cambridge Computer Lab infrastructure on Mythic Beasts, if not using one of the hosted options.

tmcgilchrist commented 1 year ago

I'm keen the style of something like https://status.gitlab.com that has space for the various public facing pieces plus the sub-systems that make everything work.

We are starting with a bottom up approach of building monitoring pages for each of:

Then we can choose something independently hosted to feed those checks into. This is just an update to say we are working towards this, with work still to do. :-)

avsm commented 1 year ago

This all sounds good. Might you please coordinate with @mtelvers on his observer.ocamllabs.io prototype mentioned in https://github.com/ocaml/infrastructure/issues/42#issuecomment-1554623791? That looks like a good start, but I suspect its database will grow quite quickly as it's storing the results of ping rebuilds in each ocurrent node.

Also as @hannesm mentions in #48, we need a check for the freshness of opam.ocaml.org. I suspect that would be better done as a email/Matrix message from a build failure in the deployer pipeline rather than a healthcheck though, since otherwise it'll be difficult to distinguish between "no pushes to opam-repo recently" and "not a fresh archive on opam.ocaml.org".