target / goalert

Open source on-call scheduling, automated escalations, and notifications so you never miss a critical alert
https://goalert.me
Apache License 2.0
2.17k stars 230 forks source link

Built-In Redundancy Monitoring for GoAlert Instances #3323

Open mastercactapus opened 9 months ago

mastercactapus commented 9 months ago

What problem would you like to solve? Please describe: There is a need for a feature to facilitate the monitoring of a GoAlert production instance employing distinct installations of GoAlert. While certain elements of this process are partially implemented, they are not documented, complex, and possibly outdated.

Describe the solution you'd like: A new admin page for Remote Monitor to streamline the configuration and operation of these monitoring services. Features required:

Describe alternatives you've considered: The current procedure involves a command, 'monitor', and the 'remotemonitor' package, to create and manage added complexity redundancy measures.

Additional context: This feature request caters primarily to the admins responsible for a GoAlert installation rather than the application's end users.

mastercactapus commented 9 months ago

A good experience will rely heavily on #3007

mastercactapus commented 9 months ago

Current state reference info

Current check sequence ![d2 (10)](https://github.com/target/goalert/assets/595010/c955eff5-1f2d-4d70-a063-e4dde015aa31)
Current Check Deployment Example ![d2 (11)](https://github.com/target/goalert/assets/595010/af604b5b-b85a-4c09-b8ef-de163428d1f6)

Worth noting that the current operation requires remote monitor to be deployed with a config file with various required settings, it's own twilio number, and be publicly-routable

mastercactapus commented 9 months ago

For next steps:

MVP: minimal API/db additions to allow create alert -> heartbeat using webhook on an EP

+1: auto sync on-call users +1: auto config from production/main instance

+2: "sentinel mode" -- slimmer UI, warning banner, make it obvious an instance is for monitoring another (with link) +3: first time setup /w sentinel

Separate supporting feature ideas:


Implementation thoughts/notes: