Document monitoring architecture and plan

Switching away from using custom code approach here, to configuring off the shelf monitoring tools.

Prometheus and Grafana as overall monitoring of statistics and alerts.

See https://github.com/ukwa/ukwa-documentation/blob/master/Monitoring-Services.md for details.

Areas to monitor:

Check HTTP-based services are up and responsive (in groups)
Check HDFS storage status and increase. (Means scraping HTML tags)
Check Crawler status (Means scanning Docker containers and re-formatting JSON)
Check Crawler local disk space etc.
Check AMQP or Kafka queues.

Logs and other events (like crawl events) routed from servers e.g. using filebeat into a monitoring-events Kafka that logstash can consume and push to elasticsearch. This acts as a 'debugging console' where last few days of logs are kept and can be used to debug what's happening.

To Do:

[ ] Should make Heritrix3 logstash data schema consistent with the Kafka crawl log feed.
[ ] Should use logstash-http-poller or Prometheus blackbox_exporter to poll HTTP endpoints.
[ ] Should work out how to expose crawl engine metrics for Prometheus. We could write an exporter , like this one

There's some useful example Docker stuff here

ukwa / ukwa-monitor

Document monitoring architecture and plan #5