ukwa / ukwa-monitor

Dashboard and monitoring system for the UK Web Archive
0 stars 5 forks source link

Monitor that Logstash is still running. #39

Open anjackson opened 2 years ago

anjackson commented 2 years ago

The crawl logs in ElasticSearch sometimes have gaps, because Logstash gets stuck on some Kafka error that appears to be transient. We need some hook to check there are recent logs in ElasticSearch, or maybe monitoring Logstash itself.

e.g. https://github.com/alxrem/prometheus-logstash-exporter ?

or https://medium.com/@malone.spencer/logstash-events-to-prometheus-912d7ac43a74

anjackson commented 2 years ago

I've added in some Prometheus exporters to scrape:

And this URL can be used to get the crawl log document count: http://logs.wa.bl.uk:9200/crawl_log*/_stats?pretty=true (via the stat-pusher).

anjackson commented 1 year ago

Better still, hits in the last minute:

http://logs.wa.bl.uk:9200/crawl_log*/_search?pretty=true&q=@timestamp:[now-1m+TO+*]&sort=@timestamp:desc&size=0
{
  "took" : 36,
  "timed_out" : false,
  "_shards" : {
    "total" : 195,
    "successful" : 195,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3150,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

i.e. track hits.total.value and warn if 0 for an extended period.

anjackson commented 1 year ago

The updates stats pusher tracks this, but needs installation and an alert; https://github.com/ukwa/ukwa-monitor/blob/3261e0e473fb57b4c9ae615418ef9f8e04bf0d41/stat-pusher/prod.stats#L112-L119