openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
46 stars 26 forks source link

Report scraper progress in JSON file #228

Closed benoit74 closed 1 month ago

benoit74 commented 3 months ago

Report scraper progress in regular JSON file expected by the Zimfarm, as iFixit, sotoki and Zimit scrapers already do.

dan-niles commented 1 month ago

As far as I understand for this issue after referring to https://github.com/openzim/ifixit/pull/52, we need to create a JSON file as following to store the progress of the scraper:

{
  "done": 1,
  "total": 4
}

To do this,

  1. We need a new CLI argument for the path to store the progress JSON file to: --stats-filename
  2. We know the total no. of videos that need to be download at this point: https://github.com/openzim/youtube/blob/853b9245aa5af552110be205b94c6c0e804bf865/scraper/src/youtube2zim/scraper.py#L340 So we can use this as the total in the JSON file and after we download each video we can increment the done count by 1.
  3. Then we need a periodic job scheduled to write and update the JSON every 10 seconds.

@rgaudin Is this correct?

rgaudin commented 1 month ago

That's correct