openzim / mindtouch

libretexts.org to ZIM scraper
GNU General Public License v3.0
0 stars 0 forks source link

Add flag to only log a warning when HTML rewriting issue arises #72

Closed benoit74 closed 6 days ago

benoit74 commented 6 days ago

We have some checks in HTML rewriting to fail the scraper if unexepected HTML is encountered.

It would be however very useful to be able to failsafe on these checks on-demand, so that one can scrape a whole property and get a whole list of warnings on issues discovered, rather than discovering them one-by-one. It will for instance help to run a full scrape of geo.libretexts.org despite https://github.com/openzim/mindtouch/issues/71 not yet being implemented.