openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
41 stars 5 forks source link

Enhance ability to debug WARC item conversion issues #248

Closed benoit74 closed 1 month ago

benoit74 commented 1 month ago

Currently, when the scraper fails to process a WARC record, the scraper immediately stops.

Since this might happen after few minutes of processing (we have some huge WARC files to process), it both quite painful for the developer, especially since only logs are available after that.

We could consider to add an option to:

This option should not be exposed in the Zimfarm.

benoit74 commented 1 month ago

Fixed by https://github.com/openzim/warc2zim/pull/252