sourcegraph / checkup

Distributed, lock-free, self-hosted health checks and status pages
https://sourcegraph.github.io/checkup
MIT License
3.41k stars 246 forks source link

Running checks from multiple locations #145

Open jeremych1000 opened 3 years ago

jeremych1000 commented 3 years ago

Hello! Stumbled upon this tool and it looks perfect for an uptime monitoring system that I'm building out. I have two quick questions.

  1. Running from multiple regions

    • I am planning to use checkup to monitor uptime of a service
    • I am planning to deploy checkup using AWS Lambdas in multiple regions to improve redundancy of the system
    • How does checkup recommend I go about this? Would I deploy multiple copies of the frontend, checkup, and db? Or use one main frontend + db in one region, and multiple checkup lambdas in other regions, all reporting back to the single main region db?
    • If one region, surely the response times will be vastly different depending on the region I'm pinging from - how does uptime differentiate between these?
  2. HA Checkup

    • There is a requirement to make this uptime monitoring solution a mission critical service, so high availability of the service needs to be built in
    • What happens if the storage mechanism is unavailable e.g. the postgres db is unavailable due to a cloud outage?
    • I don't see an option to have multiple storage backends defined, is it possible to do this? For example, store everything into postgres with a backup in S3, so in case postgres goes down we still have uptime metrics in S3?

Our high level planned architecture:

Thanks!

jeremych1000 commented 3 years ago

Update, checkup seems to be reading from mysql/postgres one by one, so am I correct in saying that the performance between reading from db and reading from S3 is the same?

https://github.com/sourcegraph/checkup/blob/master/storage/postgres/postgres.go#L94