openzim / freecodecamp

FreeCodeCamp.org scraper (to ZIM)
GNU General Public License v3.0
4 stars 2 forks source link

Adapt scraper to openzim conventions #11

Closed benoit74 closed 1 year ago

benoit74 commented 1 year ago

The scraper needs to be adapted to match openzim conventions.

List below is maybe not yet exhaustive, this is what came into my mind while working on https://github.com/openzim/zimfarm/issues/804.

PS: @mdp : no worries, this is something we will do on our own (or help you with) ; this is usual on new scrapers and mostly linked to a WIP on our side to better explain our expectations (or not, and consider this issue is normal / to do on our side since very specific to our way of working)

mdp commented 1 year ago

Hey @benoit74, Yeah this all make sense, I can tackle most of this early next week. Thanks for the _python-bootstrap repo link, I'll try and line this repo up with it.

mdp commented 1 year ago

@benoit74 #12 is ready for review, although it doesn't address all the enhancements, it DRY's up the CLI arguments, and tries to match up to other python projects. It also should address #13

benoit74 commented 1 year ago

I removed the "add support for stats JSON file" part because scraper is running so fast that it makes little sense to report progress.