Open benoit74 opened 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
ea6505f
) to head (0ce636c
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Converting to draft, we are experimenting with joblib in mindtouch scraper for now
This PR enrich the scraperlib with a
ScraperExecutor
. This class is capable to process tasks in parallel, with a given number of worker threads.This executor is mainly inspired from sotoki executor, even if we can find other executors in wikihow and in iFixit. wikihow one seems more primitive / ancient, and iFixit is just a pale copy.
For easy review, first commit is simply a copy/paste of sotoki code, and next commit are the adaptations / enhancement for scraperlib
What has been changed compared to sotoki code:
thread_deadline_sec
to the executor, should we need to customize it per executor (probably the case, priceless and useful for tests at least)if self.no_more:
insubmit
method: allows to stop accepting task even when the executor is justjoined
and notshutdown
prefix
toexecutor_name
and moved fromT-
toexecutor
(way more clear in the logs from my experience)release_halt
method which was misleading / not working (I failed tojoin
and thenrelease_halt
and thensubmit
again ... it seems mandatory tojoin
thenstart
(again) thensubmit
)thread_deadline_sec
seconds per thread. This is highly unpredictable when there are many workers (we could waitthread_deadline_sec
for first worker, thenthread_deadline_sec
for second worker, etc ...), and it is a bit weird that last worker in the list has way more time to complete than first oneThis executor will be used right now in mindtouch scraper.