openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
41 stars 2 forks source link

Incorrect ISO-639-3 language codes #822

Closed rgaudin closed 8 months ago

rgaudin commented 8 months ago

As per the spec, the Language metadata of ZIM files must be a valid ISO-639-3 language code.

The following ZIMs are incorret (invalid or deprecated) codes in Language. Those must be fixed as readers and other OPDS consumer are free to handle them differently (cardshop removes those incorrect languages and default to English if no other is provided)

ZIM Language Comment
openZIM:gutenberg_mul_all rmr As of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx].
Kiwix:wikipedia_map-bms_all:nopic map-bms invalid
Kiwix:wikipedia_be-tarask_all:maxi be-tarask invalid
Kiwix:wikipedia_eml_all:maxi eml As of 2009-01-16, [eml] for Emiliano-Romagnolo is deprecated due to split. Split into Emilian [egl] and Romagnol [rgn].
Kiwix:wikipedia_nds-nl_all:nopic nds-nl invalid
Kiwix:wikipedia_roa-tara_all:nopic roa-tara invalid
Kiwix:wikisource_zh-min-nan_all:_nopic zh-min-nan invalid
Kiwix:wikipedia_roa-tara_all:maxi roa-tara invalid
Kiwix:wikipedia_eml_all:nopic eml As of 2009-01-16, [eml] for Emiliano-Romagnolo is deprecated due to split. Split into Emilian [egl] and Romagnol [rgn].
openZIM:gutenberg_rmr_all rmr As of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx].
Kiwix:wikipedia_nds-nl_all:maxi nds-nl invalid
Kiwix:wikipedia_be-tarask_all:nopic be-tarask invalid
Kiwix:wikipedia_map-bms_all:maxi map-bms invalid
Kiwix:wikisource_zh-min-nan_all:_maxi zh-min-nan invalid
Popolechien commented 8 months ago

for rmr and eml I suspect @RavanJAltaie will have to go and ask which it is that is being used on their wiki (eml.wikipedia.org says Bolognese, which is highly unhelpful as this one has no ISO).

RavanJAltaie commented 8 months ago
ZIM Language Comment Status
openZIM:gutenberg_mul_all rmr As of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx].
Kiwix:wikipedia_map-bms_all:nopic map-bms invalid Changed to jav (javanese)
Kiwix:wikipedia_be-tarask_all:maxi be-tarask invalid Changed to wikipedia_be-x-old_all
Kiwix:wikipedia_eml_all:maxi eml As of 2009-01-16, [eml] for Emiliano-Romagnolo is deprecated due to split. Split into Emilian [egl] and Romagnol [rgn]. Changed to wikipedia_egl_all, but we need to add Emilian Language to language drop down list in zimfarm
Kiwix:wikipedia_nds-nl_all:nopic nds-nl invalid
Kiwix:wikipedia_roa-tara_all:nopic roa-tara invalid
Kiwix:wikisource_zh-min-nan_all:_nopic zh-min-nan invalid Changed to wikisource_nan_all, but we need to add Nan language tp languages list in zimfarm
Kiwix:wikipedia_roa-tara_all:maxi roa-tara invalid
Kiwix:wikipedia_eml_all:nopic eml As of 2009-01-16, [eml] for Emiliano-Romagnolo is deprecated due to split. Split into Emilian [egl] and Romagnol [rgn]. Changed to wikipedia_egl_all, but we need to add Emilian Language to language drop down list in zimfarm
openZIM:gutenberg_rmr_all rmr As of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx].
Kiwix:wikipedia_nds-nl_all:maxi nds-nl invalid
Kiwix:wikipedia_be-tarask_all:nopic be-tarask invalid Changed to wikipedia_be-x-old_all
Kiwix:wikipedia_map-bms_all:maxi map-bms invalid Changed to jav (javanese)
Kiwix:wikisource_zh-min-nan_all:_maxi zh-min-nan invalid Changed to wikisource_nan_all, but we need to add Nan language tp languages list in zimfarm
rgaudin commented 8 months ago

What should be changed is the language code in the mwoffliner params ; not the Zimfarm recipe language (that's used only for filtering in the farm)

RavanJAltaie commented 8 months ago

@benoit74 I didn't know how to change the language in multiple language Gutenberg recipe. Any idea how shall I do it?

benoit74 commented 8 months ago

For Gutenberg, we use the "one-language-one-zim" mode in Zimfarm. In this mode, the language is set automatically by the scraper. Obviously the scraper is creating ZIMs with improper language => open upstream issue in Gutenberg scraper, nothing you can solve yourself.

Specify there are two issues:

RavanJAltaie commented 8 months ago

Done. Issue created https://github.com/openzim/gutenberg/issues/217

benoit74 commented 8 months ago

I don't think anything is solved, let's discuss it live in few minutes ^^