Closed RavanJAltaie closed 5 months ago
I don't get why we are deleting these files if the recipe is continuously failing and the only problem in the ZIM is an incorrect flavor.
If the recipe is continuously failing, I would rather find a way to fix the flavor in the ZIM (only devs can do that, but it is probably feasible) so that we do not loose all these contents.
@Popolechien @kelson42 WDYT?
Well there's the problem of content becoming stale (I remember a recent deletion request for a game-related zim file that I posted not too long ago), but more broadly this raises the question of whether we're an archiving service or an offline internet one (I think we're the latter).
There's an ancillary impact to deleting some of the files (MDwiki is used in one of the pre-made configs we sell from the imager, for instance), so there better be a discussion of what the policy should be, e.g. if
To be clear, by "evergreen" I mean content that was relevant then, and still is now. Medical content or encyclopedia-type are probably there; at the other end of the spectrum I see content that is related to an evolving topic (e.g. games, entertainement, user manuals for tech à la scratchwiki or gentoo).
I reckon that we should keep files meeting criterion 1 (then the hard question is when content is evergreen -ish and how much grey we are willing to handle, or how much time we're willing to spend trying to figure out if Pokemon is still evolving as a game).
You are right, this is probably a very good example of a missing Policy around criteria for deletion of ZIMs.
My personal taste is that we should keep as much content as possible until it is either significantly outdated or represents an issue in term of copyright or acceptable content. As a user, I would be embarrassed / sad to realize that offline contents are disappearing at the same pace than the online websites. As a contributor, I would be embarrassed to realize we are throwing content we've spend time / resources to create. But clearly, this is a personal taste and I might not be aligned with Kiwix / openZIM goals.
Somehow, I don't think the archiving / offline internet is the proper question, I agree that we should not keep all versions of all content we've ever created for archiving purposes. But it does not mean we shouldn't keep as much content as possible since we've invested in them.
Let's discuss about it in our Friday meeting maybe?
@benoit74 We've discussed this with @Popolechien. It might help to know that all these files have already another copy in the library. So deleting them will not affect on archiving content. Please proceed in deleting them.
It might help to know that all these files have already another copy in the library.
This statement is false, at least allthetropes_en_all_maxi_2020-10.zim
has no other copy in the library (maybe it is the single exception).
And I don't get what makes you believe that another file has a better Flavor metadata?
I had a look and it seems that you are requesting to delete the most recent file for every recipe.
I doubt that an older file will have a better Flavor metadata.
I checked appropedia_en_all_maxi_2021-01.zim
(the one will be stay in the library after your cleanup request) and can confirm this older copy has the same flaw in Flavor
value:
>>> zim = Archive("appropedia_en_all_maxi_2021-01.zim")
>>> zim.metadata
{
"Counter": "text/plain=10;text/css=40;application/javascript=38;image/png=1;text/html=10150;image/webp=22548;application/pdf=74;image/gif=609;text/html; charset=UTF-8=37;undefined=3;text/html; charset=ISO-8859-1=1;text/html; charset=utf-8=3;text/html;charset=UTF-8=1",
"Creator": "Appropedia",
"Date": "2021-01-28",
"Description": "Sharing knowledge to build rich & sustainable lives",
"Flavour": "_maxi",
"Language": "eng",
"Name": "appropedia_en_all",
"Publisher": "Kiwix",
"Scraper": "mwoffliner 1.11.3",
"Tags": "appropedia;_pictures:yes;_videos:no;_details:yes;_ftindex:yes",
"Title": "Appropedia"
}
Should we discuss this live?
@benoit74 as per our last discussion, I've reviewed the links of proposed files for deletion, all files has another copy except these files:
https://download.kiwix.org/zim/other/allthetropes_en_all_maxi_2020-10.zim https://download.kiwix.org/zim/other/installgentoo_en_all_nopic_2019-09.zim https://download.kiwix.org/zim/other/installgentoo_en_all_nopic_2019-09.zim
You can keep them for archiving purpose and delete the rest of list please.
I'm glad you confirm that we have a previous version for most ZIMs, but I still have the same question: are you sure that previous version of these ZIMs have correct flavor? I suspect they don't have a more appropriate flavor. Did you checked that as well?
Otherwise I think there is only two solutions forward:
@benoit74 Just to be in the same page, at earlier stage of this fix project, I already manually fixed all the wrong flavours in the recipes in zimfarm. The second stage is to remove the affected files, the files mentioned in this issue are the affected ones and need to be deleted. The older versions of the files are ok, I checked, as the error causing this problem was temporary and fixed.
@RavanJAltaie I've deleted the first two files (appropedia nopic and maxi) to check if everything is fine. Version 2021-03 is now gone, and replaced by version 2021-01. Library is now updated to use this 2021-01 version.
Please have a look at https://library.kiwix.org/viewer#appropedia_en_all_maxi_2021-01 and https://library.kiwix.org/viewer#appropedia_en_all_nopic_2021-01
As far as I can tell these older versions are no better than the 2021-03 version I've just deleted. Flavour is still wrong on both ZIMs: https://library.kiwix.org/raw/appropedia_en_all_maxi_2021-01/meta/Flavour and https://library.kiwix.org/raw/appropedia_en_all_nopic_2021-01/meta/Flavour
I'm sorry but there is something I really don't get on this, why are you saying "The older versions of the files are ok, I checked, as the error causing this problem was temporary and fixed.", could you explain me what is better in the 2021-01 version of appropedia ZIMs compared to the 2021-03 version?
@benoit74 I've double checked the related recipes, all of them are already disabled except this one: wikisource_zh-min-nan is active and succeeding. So I believe you can start the process of creating the files correctly (manually) as discussed, then we can delete the wrong files from the library.
OK thank you, as discussed we do not need to delete wrong files, they will be replaced. Closing this issue, nothing to do in this issue in fact, job's left to do is tracked in https://github.com/openzim/zim-requests/issues/1089
Please delete these files as a part of cleaning the library. Their corresponding recipes have been corrected and disabled for continuous failing.