Open Popolechien opened 5 months ago
Icon is available in drive.
Thank you, planned for end of summer as discussed
For the record, I'm beginning work on this project.
I've created a project where I've initialized first discovery tasks on https://github.com/openzim/librechef/
I'm now investigating if it is better to update (and maintain in the future) this sushichef recipe, or if we should rather change our plans and use zimit scraper. Very first test with zimit seems to indicate it is at least not impossible.
Regarding the last point, using zimit
is probably not even an option. The "killer" reason is that it is impossible to compress Youtube videos currently. Or at least without significant investment in warc2zim, but this has even already been discussed as a nogo, we do not want to begin to alter things in warc2zim more than necessary for HTML/JS to work.
There are also some significant issues which have been discovered in the first try (see dropped issues in project), not speaking about custom CSS and behaviors needed to make everything in place.
The balance would probably have been quite even without the first "killer" reason, but here it is!
ATM, I hence consider that using kolibri
scraper is the way to move this forward and I've already found how to fix most obvious issues to make something run end-to-end and create a very first draft ZIM. Only need to assemble it in meaningful PRs ready to review ^^
And of course, there are still a bunch of issues to solve at kolibri v2 level.
After investigating a bit more into librechef and kolibri, I now consider this strategy is also not the optimal one.
librechef imposes a set of constraints (e.g. navigation by topic, no description longer than 200 chars on topics, ...) which are going to be painful.
librechef is hard to debug, for instance fixing a UI bug only present in HTML is requiring to rerun the whole process of crawling the website, pushing to the Studio, creating the ZIM
Using librechef also poses problems in term of deployment: we have no idea how to run this in production or at least we know it is going to be a "trick" (see https://github.com/kiwix/operations/issues/262)
Another concern is that librechef is based on ricecooker which seems to be barely maintained (e.g. it still depends on Python 3.10 while we already have 3.11 since 2 years, and 3.12 since 1 year)
All this could have made sense if we knew that we were going to have more and more kolibri channels to ZIM, but it does not looks like it is going to happen in the coming months.
So I now consider we have to serisouly consider creating a new scraper libretext
because this is going to be the cheapest solution in term of initial creation AND in terms of maintenance. Not speaking about the fact that it will reduce our dependency to a partner (should Kolibri Studio stop working, librechef would become a problem as well).
I will investigate this way forward in the coming days.
The following libraries should be zimed up: Biology Business Engineering Geoscience Medicine Humanities K-12 Education Mathematics Physics Social sciences Statistics Workforce Espanol Ukrayinska