openzim / wikihow

WikiHow scraper
https://download.kiwix.org/zim/wikihow/
GNU General Public License v3.0
16 stars 2 forks source link

Speed-up video processing when including Youtube videos #97

Open rgaudin opened 3 years ago

rgaudin commented 3 years ago

--without-videos only prevent the Youtube ones from being processed and included but there are many regular videos anyway. Those regular videos are short and small-sized and easy to process.

Youtube ones can be huge and are retrieved from Youtube where exceeding a certain number of resource usage (requests? bandwidth ?) results in being banned for a period of time.

We thus need to process Youtube videos one by one but others can be handled concurrently.

Current code specify the nb of videos worker based on --without-videos but ideally, we should treat both sources differently no matter the option as including Youtube videos means processing all the videos on a single worker.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.