Closed anjackson closed 2 years ago
We need a new alert, alongside this one that is based on what's on HDFS
The new alert should be based on that, but use this metric, which spots when the tidy-logs job has noted that the crawl log is missing or not growing.
tidy-logs
delta(ukwa_crawler_log_size_bytes{log='crawl.log'}[1h]) == 0 or absent(ukwa_crawler_log_size_bytes{log='crawl.log'})
If this condition is active for: 1h then an alert should inform us that the crawl_job_name crawl is not writing to it's crawl.log.
for: 1h
crawl_job_name
crawl.log
Implemented in our beta service. Awaiting confirmation that it's providing what's required (which is difficult to tell whilst the alarm isn't actually going off).
Rolled out to production monitor.
We need a new alert, alongside this one that is based on what's on HDFS
The new alert should be based on that, but use this metric, which spots when the
tidy-logs
job has noted that the crawl log is missing or not growing.If this condition is active
for: 1h
then an alert should inform us that thecrawl_job_name
crawl is not writing to it'scrawl.log
.