Closed darrylmendillo closed 4 years ago
Create Sev 1 alarms
Create recovered (back online status)
Set up two thresholds for Signals and VentCue systems by monitoring the primary Mongo collections. flowsheet_metrics, location_metrics, cue_metrics, mar_metrics
Potential approach Set up grafana to monitor the number of events written to the following collections:
mon_db.collection_names()
'flowsheet_metrics', 'location_metrics', 'cue_metrics', 'mar_metrics'
Example:
Grafana does not natively support MongoDB as a data source.
Solutions:
Got it. Based on the conversation this morning. I'll remove the notes on Mongo and update to "logs".
Created two sets of criteria for the above applications. Each event is evaluated every minute
# data_rate = events/15min
# 1 event / 15min = 0.0011 events/sec
alert_rate = 0.001
data_rate = len(samples_total_15m) / (60 * 15)
if (max(data_rate) == None) or min(data_rate) < alert_rate: pending = true pending_time+= 1 else: pending = false pending_time = 0
if pending and pending_time >= 15: fire_alert()
2. Batch:
alert_rate = 0.00001
data_rate = len(samples_total_24hr) / (60 60 24)
if (max(data_rate) == None) or min(data_rate) < alert_rate: pending = true pending_time+= 1 else: pending = false pending_time = 0
if pending and pending_time >= 120: fire_alert()
Create alerts with event driven and batch driven applications stop creating stdout logs.
Sev 1 alerts driven by application status
Event Driven:
Batch Driven
Use Alertmanager to create alerts for: