ukwa / ukwa-monitor

Dashboard and monitoring system for the UK Web Archive
0 stars 5 forks source link

Add indexing metrics and URL spot checks #18

Closed anjackson closed 3 years ago

anjackson commented 3 years ago

This extends the set of metrics to cover the CDX indexing and some crawl activity. It looks to the tracking database and checks the total number of WARCs that have been marked as indexed, and the timestamp of the most recent one. It also looks a open-access pywb and gets the timestamp of a site that should be crawled every day (bl.uk/robots.txt). Both these timestamps can later be set up with corresponding alerts.

anjackson commented 3 years ago

Okay, so tested now on dev. I had to remove urllib from the dependencies because it's not called that anymore, and anyway it gets pulled in appropriately as a requirement for the others.

I'll clean up this pull-request to focus on the new metrics and open a couple of issues on other things we might look at later on.

anjackson commented 3 years ago

Added #19 and #20 as issues that arose while working on this, but they need not block this, I think.