openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
40 stars 5 forks source link

Log is full of messages of "expected" missing entries which matches the include/exclude/scopeType #322

Open benoit74 opened 1 week ago

benoit74 commented 1 week ago

Currently, when scraper rewrites documents, it logs a DEBUG message saying "WARNING {item_path} ({item_url}) not in archive." whenever it finds an external URL.

While this is useful for debugging purposes, in some occasions it is very wordy, especially when we've decided to not crawl some pages of the website, see e.g. https://github.com/openzim/zim-requests/issues/832#issuecomment-2175204828