Open Popolechien opened 2 months ago
I'm not against deleting it, but I would rather fix the bug because there is one somewhere around the metadata handling.
Do we know what should be proper language metadata in the ZIM? Should it be "bel"?
The bug does not appear with development version of MWoffliner. I have investigated a bit where exactly it has been fixed, but was not able to determine is. Actually it does not matter that much. What is sure is that with version 1.13.0
, MWoffliner was writting be-tarask
and now with 1.14.0
it writes bel
, which is correct. I have started a scrape with dev
container of MWoffliner and hopefully it will go to the end.
We still have a problem, because ZIM file of this recipe will overwrite the one of https://farm.openzim.org/recipes/wikipedia_be_all. A filename prefix should probably be set.
@rgaudin I have have put a filename prefix and this point is fixed (hopefuly correctly), now I have a questionning about the "Name" metadata, how to avoid conflict with the other "bel" recipe ZIM files?
Allow me to recap the issue as it's not clearly written.
tarask | be | |
---|---|---|
Wikipedia | be-tarask.wikipedia.org |
be.wikipedia.org |
Language code | be-x-old (Belarusian (Taraškievica orthography)) |
be ( Belarusian) |
ZIM filename | wikipedia_be-tarask_all_nopic_2023-05.zim |
wikipedia_be_all_nopic_2024-04.zim |
ZIM Language |
be-tarask (incorrect) |
bel (OK) |
ZIM Name |
wikipedia_be-tarask_all (possibly correct) |
wikipedia_be_all (OK) |
Zimfarm recipe | wikipedia_be-x-old_all |
wikipedia_be_all |
This is an edge (but not unique) case of two WP projects with different languages that share the same ISO-639-3 code (bel
). That's because the main language is Belarusian and Taraškievica is just the classical orthography.
So far, we've chosen to strictly follow ISO-639-3 for technical and management purpose (simplicity) but that's only for the 99%.
I believe the given ZIM name/language given by @rgaudin is not the one we have currently. We should verify, recipe has been successful.
I have written the previous situation, from the previous ZIM files ; here's the updated data
tarask-2023 | be | tarask-2024 | |
---|---|---|---|
Wikipedia | be-tarask.wikipedia.org |
be.wikipedia.org |
be-tarask.wikipedia.org |
Language code | be-x-old (Belarusian (Taraškievica orthography)) |
be ( Belarusian) |
be-x-old |
ZIM filename | wikipedia_be-tarask_all_nopic_2023-05.zim |
wikipedia_be_all_nopic_2024-03.zim |
wikipedia_be-tarask_all_nopic_2024-04.zim |
ZIM Language |
be-tarask (incorrect) |
bel (OK) |
bel (OK) |
ZIM Name |
wikipedia_be-tarask_all (possibly correct) |
wikipedia_be_all (OK) |
wikipedia_be-tarask_all (possibly OK) |
Zimfarm recipe | wikipedia_be-x-old_all |
wikipedia_be_all |
wikipedia_be-x-old_all |
Given we don't use ISO-639-3 on Name
nor on filename, there's no conflict. Only different ZIMs with same title (but description is different).
Oh and you've overridden the 2024-04 be_all
with the tarask version!
ZIM(s) location
https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_maxi_2023-04.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_maxi_2023-05.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_nopic_2023-04.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_nopic_2023-05.zim
Recipe(s) URL
https://farm.openzim.org/recipes/wikipedia_be-x-old_all/edit
Readers tested
Both ZIM versions impacted?
Yes, both versions are impacted
Details
Recipe has been failing for a while and appears in English selection. Considering that this sub-genre of Belarusian is low traffic it won't be missed too much, but appearing in the English (high traffic) library (including Imager) is poor UX.