Check HTTP-based services are up and responsive (in groups)
Check HDFS storage status and increase. (Means scraping HTML tags)
Check Crawler status (Means scanning Docker containers and re-formatting JSON)
Check Crawler local disk space etc.
Check AMQP or Kafka queues.
Logs and other events (like crawl events) routed from servers e.g. using filebeat into a monitoring-events Kafka that logstash can consume and push to elasticsearch. This acts as a 'debugging console' where last few days of logs are kept and can be used to debug what's happening.
To Do:
[ ] Should make Heritrix3 logstash data schema consistent with the Kafka crawl log feed.
Switching away from using custom code approach here, to configuring off the shelf monitoring tools.
Prometheus and Grafana as overall monitoring of statistics and alerts.
See https://github.com/ukwa/ukwa-documentation/blob/master/Monitoring-Services.md for details.
Areas to monitor:
Logs and other events (like crawl events) routed from servers e.g. using
filebeat
into a monitoring-events Kafka thatlogstash
can consume and push toelasticsearch
. This acts as a 'debugging console' where last few days of logs are kept and can be used to debug what's happening.To Do:
logstash
data schema consistent with the Kafka crawl log feed.logstash-http-poller
or Prometheusblackbox_exporter
to poll HTTP endpoints.There's some useful example Docker stuff here