openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

Delete be-tarask.wikipedia #960

Open Popolechien opened 2 months ago

Popolechien commented 2 months ago

ZIM(s) location

https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_maxi_2023-04.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_maxi_2023-05.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_nopic_2023-04.zim https://download.kiwix.org/zim/wikipedia/wikipedia_be-tarask_all_nopic_2023-05.zim

Recipe(s) URL

https://farm.openzim.org/recipes/wikipedia_be-x-old_all/edit

Readers tested

Both ZIM versions impacted?

Yes, both versions are impacted

Details

Recipe has been failing for a while and appears in English selection. Considering that this sub-genre of Belarusian is low traffic it won't be missed too much, but appearing in the English (high traffic) library (including Imager) is poor UX.

kelson42 commented 2 months ago

I'm not against deleting it, but I would rather fix the bug because there is one somewhere around the metadata handling.

Do we know what should be proper language metadata in the ZIM? Should it be "bel"?

kelson42 commented 2 months ago

The bug does not appear with development version of MWoffliner. I have investigated a bit where exactly it has been fixed, but was not able to determine is. Actually it does not matter that much. What is sure is that with version 1.13.0, MWoffliner was writting be-tarask and now with 1.14.0 it writes bel, which is correct. I have started a scrape with dev container of MWoffliner and hopefully it will go to the end.

kelson42 commented 2 months ago

We still have a problem, because ZIM file of this recipe will overwrite the one of https://farm.openzim.org/recipes/wikipedia_be_all. A filename prefix should probably be set.

kelson42 commented 2 months ago

@rgaudin I have have put a filename prefix and this point is fixed (hopefuly correctly), now I have a questionning about the "Name" metadata, how to avoid conflict with the other "bel" recipe ZIM files?

rgaudin commented 2 months ago

Allow me to recap the issue as it's not clearly written.

tarask be
Wikipedia be-tarask.wikipedia.org be.wikipedia.org
Language code be-x-old (Belarusian (Taraškievica orthography)) be ( Belarusian)
ZIM filename wikipedia_be-tarask_all_nopic_2023-05.zim wikipedia_be_all_nopic_2024-04.zim
ZIM Language be-tarask (incorrect) bel (OK)
ZIM Name wikipedia_be-tarask_all (possibly correct) wikipedia_be_all (OK)
Zimfarm recipe wikipedia_be-x-old_all wikipedia_be_all

This is an edge (but not unique) case of two WP projects with different languages that share the same ISO-639-3 code (bel). That's because the main language is Belarusian and Taraškievica is just the classical orthography. So far, we've chosen to strictly follow ISO-639-3 for technical and management purpose (simplicity) but that's only for the 99%.

kelson42 commented 2 months ago

I believe the given ZIM name/language given by @rgaudin is not the one we have currently. We should verify, recipe has been successful.

rgaudin commented 2 months ago

I have written the previous situation, from the previous ZIM files ; here's the updated data

tarask-2023 be tarask-2024
Wikipedia be-tarask.wikipedia.org be.wikipedia.org be-tarask.wikipedia.org
Language code be-x-old (Belarusian (Taraškievica orthography)) be ( Belarusian) be-x-old
ZIM filename wikipedia_be-tarask_all_nopic_2023-05.zim wikipedia_be_all_nopic_2024-03.zim wikipedia_be-tarask_all_nopic_2024-04.zim
ZIM Language be-tarask (incorrect) bel (OK) bel (OK)
ZIM Name wikipedia_be-tarask_all (possibly correct) wikipedia_be_all (OK) wikipedia_be-tarask_all (possibly OK)
Zimfarm recipe wikipedia_be-x-old_all wikipedia_be_all wikipedia_be-x-old_all

Given we don't use ISO-639-3 on Name nor on filename, there's no conflict. Only different ZIMs with same title (but description is different).

Oh and you've overridden the 2024-04 be_all with the tarask version!