openzim / ifixit

iFixit to ZIM scraper
GNU General Public License v3.0
24 stars 3 forks source link

Scraper should report about progression #43

Closed kelson42 closed 2 years ago

kelson42 commented 2 years ago

@rgaudin Where is the doc for the API?

rgaudin commented 2 years ago

There's None 😬

You should should write a JSON file (at the path specified by your option) with the following content:

{
  "done": 3,
  "total": 1870
}

We also have an optionnal property for zimit in-which the scraper has a limit after which it stops crawling. I don't think that applies for iFixIt but it's like:

{
  "done": 3,
  "total": 1870,
  "limit": {
    "max": 100,
    "hit": false
  }
}

Code that reads it is at https://github.com/openzim/zimfarm/blob/master/workers/app/task/worker.py#L159

benoit74 commented 2 years ago

@rgaudin do you confirm it is ok if the total is increasing during the scraping? Typical the scraper will discover new items to scrape while scraping other items, so while I could make a first guess on the total number of items to scrape, the total will keep increasing while progressing.

rgaudin commented 2 years ago

Absolutely; that's the point