tattle-made / factchecking-sites-scraper

A repo to store helper functions for scraping + experiments/visualisations
GNU General Public License v3.0
2 stars 7 forks source link

Error logging for factchecking scraper #54

Open tarunima opened 2 years ago

tarunima commented 2 years ago

The v3 of the fact checking scraper has the following modules:

At present the scraper stops running if an error is found in downloader, parser or uploader. For some sites, there is some error handling for edge cases. For example, see get_all_images function here: https://github.com/tattle-made/factchecking-sites-scraper/blob/master/scraper_v3/newschecker.py. But the error handling is not systematic.

We need appropriate error handling and logging of errors for the scrapers. This is the functional output:

Implementing this functionality will require tweaks in the specific functions, as well as the main function. For management of control, after errors in the article downloader/parser stage, in the main function see this suggestion by @RishavT: https://github.com/tattle-made/factchecking-sites-scraper/pull/53.

All the errors for a scraper of a specific site can be logged in one __errors.txt file. The error file can be in the language specific folder. For example, tmp//<_lang_>/*_errors.txt.