ukwa / ukwa-services

Deployment configuration for all UKWA services stacks.
Apache License 2.0
4 stars 5 forks source link

Crawl Log Viewing #38

Closed anjackson closed 2 years ago

anjackson commented 3 years ago

Current plan is to siphon crawl events into a large database, for recently crawled FC material. We will use Solr at first because we know how to run it at scale, reconsidering CockroachDB later if we need e.g. proper SQL or ACID transactions etc.

Start with a simple Solr indexed version of the standard crawl log, so we can:

See https://github.com/ukwa/crawl-db/issues/1

Need to find a way to tidy up the H3 log parsing code and related code that is spread around:

anjackson commented 2 years ago

Switched to Grafana, with Logstash pulling the fc.crawled feed in to ElasticSearch. Fine solution except it seems to fail from time to time with Kafka complaining about corrupted records. I'm concerned Gluster might be having problems.

Anyway, need some Prometheus hook to monitor that Logstash is still running.

anjackson commented 2 years ago

Calling this done, with an ukwa-monitor issue for monitoring Logstash. https://github.com/ukwa/ukwa-monitor/issues/39