openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
20 stars 18 forks source link

Add executor to zimscraperlib #211

Open benoit74 opened 2 weeks ago

benoit74 commented 2 weeks ago

This PR enrich the scraperlib with a ScraperExecutor. This class is capable to process tasks in parallel, with a given number of worker threads.

This executor is mainly inspired from sotoki executor, even if we can find other executors in wikihow and in iFixit. wikihow one seems more primitive / ancient, and iFixit is just a pale copy.

For easy review, first commit is simply a copy/paste of sotoki code, and next commit are the adaptations / enhancement for scraperlib

What has been changed compared to sotoki code:

This executor will be used right now in mindtouch scraper.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (ea6505f) to head (0ce636c).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #211 +/- ## ========================================== Coverage 100.00% 100.00% ========================================== Files 38 39 +1 Lines 2221 2327 +106 Branches 426 446 +20 ========================================== + Hits 2221 2327 +106 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 2 weeks ago

Converting to draft, we are experimenting with joblib in mindtouch scraper for now