Open RavanJAltaie opened 4 months ago
I can see about Caló (it's only one book) from upstream, but none of the others are language codes from PG, that I know of.
Thank you @eshellman, if you could fix rmr
upstream it would be great ; otherwise we would have to add a "hack" to our scraper to transform rmr
into rmq,emx
since it's probably the real situation, or maybe only rmq
mul
is a hack for the ZIM we create with all languages. The scraper should not do that to respect openZIM specification, and list all languages. This part is for us ^^
@benoit74 Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.
Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.
Yep, I had this in mind. Thank you for confirming before I even asked 😄
(and sorry for the wrong description in first comment, I wrote it too fast)
For Gutenberg, we use the "one-language-one-zim" mode in Zimfarm. In this mode, the language is set automatically by the scraper. Obviously the scraper is creating ZIMs with improper language => open upstream issue in Gutenberg scraper, nothing you can solve yourself.
there are two issues:
openZIM:gutenberg_mul_all
is improper ZIM name,mul
language is not a valid ISO-639-3 language codeopenZIM:gutenberg_rmr_all
is improper ZIM name,rmr
language is not a valid ISO-639-3 language code anymore ; as of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx]Edit:
openZIM:gutenberg_mul_all
:mul
language is not a valid ISO-639-3 language code, it must be a csv list of ISO-639-3 sorted by importance (so number of entries here)openZIM:gutenberg_rmr_all
:rmr
language is not a valid ISO-639-3 language code anymore ; as of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx]rmq
probably)rmq
orrmq,emx