openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Correct Language codes in Gutenberg recipes #217

Open RavanJAltaie opened 4 months ago

RavanJAltaie commented 4 months ago

For Gutenberg, we use the "one-language-one-zim" mode in Zimfarm. In this mode, the language is set automatically by the scraper. Obviously the scraper is creating ZIMs with improper language => open upstream issue in Gutenberg scraper, nothing you can solve yourself.

there are two issues:

Edit:

eshellman commented 4 months ago

I can see about Caló (it's only one book) from upstream, but none of the others are language codes from PG, that I know of.

benoit74 commented 4 months ago

Thank you @eshellman, if you could fix rmr upstream it would be great ; otherwise we would have to add a "hack" to our scraper to transform rmr into rmq,emx since it's probably the real situation, or maybe only rmq

mul is a hack for the ZIM we create with all languages. The scraper should not do that to respect openZIM specification, and list all languages. This part is for us ^^

rgaudin commented 4 months ago

@benoit74 Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.

benoit74 commented 4 months ago

Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.

Yep, I had this in mind. Thank you for confirming before I even asked 😄

benoit74 commented 4 months ago

(and sorry for the wrong description in first comment, I wrote it too fast)