spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Critical alerts/network health monitoring #216

Open pigmej opened 1 year ago

pigmej commented 1 year ago

As briefly discussed in Gdansk we need extend our network monitoring to have:

To be able to achieve that we need to know and understand what is still considered good enough and what not.

See https://github.com/spacemeshos/Infrastructure/issues/56

Target date to finish initial work (list of metrics to collect, status of each, understanding existing infrastructure): 2023/06/26 Target date to finish collecting missing genesis-critical metrics, setting up dashboards, etc.: 2023/07/10

lrettig commented 12 months ago

The list is basically done, @noamnelke and I spent a lot of time on it over the last few days. We'll investigate what's already available and what's feasible to add before genesis.

lrettig commented 11 months ago

List of metrics to monitor is done. I'm in the process of investigating which metrics are already being collected and how they're being collected and monitored, and figuring out how to add stuff that's still missing.

lrettig commented 10 months ago

Mostly finished this investigation. Handed off to @pigmej.