openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Create smaller selections #184

Open Popolechien opened 1 year ago

Popolechien commented 1 year ago

The Gutenberg zim currently stands at 70GB in English, which has generated comments from users that it is rather unwiedly.

It would be interesting to offer fiction / non fiction subsets or anything based on the projects' bookshelves

rgaudin commented 1 year ago

What would be the input? A list of bookshelves IDs to include?

Popolechien commented 1 year ago

Would this be the easiest way to implement such an idea?

Alternatively, at this stage I don't think that the input should be left to users (or any curator), so maybe having the scraper automatically generate a zim for each bookshelf ID might be less labour intensive first step.

rgaudin commented 1 year ago

I see 👍 probably a good first step

eshellman commented 1 year ago

The PG bookshelves are currently not maintained; they used to be maintained on a wiki that got shut down because the underlying wiki software had security issues. Re-enabling the bookshelf management is a project that was worked on a year ago but didn't reach the finish line. So it might be a good idea to wait on this.

kelson42 commented 10 months ago

The PG bookshelves are currently not maintained; they used to be maintained on a wiki that got shut down because the underlying wiki software had security issues. Re-enabling the bookshelf management is a project that was worked on a year ago but didn't reach the finish line. So it might be a good idea to wait on this.

@eshellman Thank you for this important feedback. I guess the problem is mostly not technical. Maybe this can be done somewhere in a dedicated code repository? For example github has a small wiki engine and a wiki is available for each repository. Anyway, that sounds problematic to implement this feature if the PG shelfs are not maintained!