openzim / devdocs

devdocs.io to ZIM scraper
GNU General Public License v3.0
1 stars 0 forks source link

Create the initial implementation #1

Open josephlewis42 opened 2 months ago

josephlewis42 commented 2 months ago

Create the initial implementation of the new Devdocs scraper as requested in https://github.com/openzim/zim-requests/issues/1086.

Once implemented and merged, use the knowledge gained to complete https://wiki.openzim.org/w/index.php?title=How-to_create_a_Python_scraper&gettingStartedReturn=true

josephlewis42 commented 1 month ago

I've been working on this in the background poking at various ways forward. A couple of the major things I've found are:

My plan of attack is this:

  1. [x] Get the type definitions and devdocs HTTP client in with tests.
  2. [ ] Add the skeleton CLI that doesn't generate a ZIM.
  3. [ ] Generate a ZIM using Python templates. The ZIM's search index will replace the devdocs sidebar index.

Once that's done we'll have a working MVP and I can look at building out a more modern front-end.

benoit74 commented 1 month ago

Plan seems to be appropriate, nothing much to add but some minor remarks:

josephlewis42 commented 1 month ago

do we need to reimplement it on our side or can we automatically reuse the code from their git repo? If we reimplement it on our side, it is mandatory to mention where this code comes from (which file(s)) and what is the commit / date when the "sync" was done, so that maintenance is possible (we will probably need to update this code at some point)

It looks to me like the JS is too wrapped up with features we wouldn't want (e.g. service workers, local database usage to "download" items for offline use, SPA, click tracking) and they'd cause a confusing overlap with the functionality of ZIM readers so pulling bits out will likely be difficult.

In this specific case, the code has been the same for over eight years so I anticipate it won't change much in the future. However, I'll be sure to leave notes.

I haven't talked with the maintainers over there, but there might be opportunities to help them and us at the same time either by creating a front-end that can optionally exclude the bits we don't want so we could package the app wholesale or by pushing some of this frontend logic into the data itself.

josephlewis42 commented 1 month ago

I've been able to make some significant progress on this locally with Jinja to get us close to the stated MVP which I'll split into PRs:

image

A couple of limitations I'm noticing that I want to document as things to follow up on:

benoit74 commented 1 month ago

Woah, good job, looks impressive so far!