valeriansaliou / vigil

🚦 Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).
https://crates.io/crates/vigil-server
Mozilla Public License 2.0
1.73k stars 128 forks source link

Feature Request Inquiry: Add outage threshold #70

Open csp197 opened 3 years ago

csp197 commented 3 years ago

Hello!

Currently if there are 10 replicas on the vigil status page, and 3 of them are dead, Vigil declares a "Partial Service Outage". I would like to inquire if the concept of an outage threshold value could be instrumented, which would be the minimum ratio needed before declaring an "Service Outage".

This value could be set in the config.cfg file:

outage_threshold = 0.5 //or 50?

This value would represent the minimum ratio of the # of dead replicas to the # of total replicas needed to declare a state of "Service Outage".

valeriansaliou commented 3 years ago

Thanks, nice idea! I could also suggest that some nodes have a stronger "downtime" weight than other ones, eg. database servers, or any SPOF node.