webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
643 stars 82 forks source link

Add more verbose logs in browsertrix #261

Open PedroG1515 opened 1 year ago

PedroG1515 commented 1 year ago

Desire Feature Arquivo.pt needs more verbose logs to be able to find errors.

How to implement Some of the fields are in the cdxj file, however, some errors are not saved in CDXJ. It would be interesting to add a logging system in which there would be a verbose mode. We suggest using a format similar to Heritrix, with the following fields:

ikreymer commented 12 months ago

This should be possible in the new 1.0.0 system, since the WARC writing is now in the crawler. Will probably use SHA256 instead of SHA1, but generally everything else should be there, I think.