Closed anjackson closed 5 years ago
Proposal is to write something to parse multiple log files, which will output
Host/Target, Launch Date, Total URLs, status codes, etc.
Not 100% clear how to do this. e.g. process log files once, output summary per log file into local file or DB. Then summarise over local files/DB. ?
After some analysis, a couple of problems arose. See main body of ticket.
Closing this as it doens't really fit as a ticket.
The continuous crawler has been running successfully for weeks, but we need to verify that it is doing a sufficiently good job to justify the switch-over.
Proposal is to generate crawl volume breakdowns per host across daily and weekly crawl streams, and compare them to make sure they are roughly equivalent.