Open benoit74 opened 6 months ago
LGTM ; I can't find the other discussion but found this (dont look at the rest of the ticket) which is a bit similar. I find your approach better in several ways: commit to mark stuff we want to keep ~forever (so we'll get a commit message) and a short duration to deletion (otherwise there's the risk of postponing it then missing the deadline)
Currently, we do not have any precise procedure or tooling around cleanup of ZIMs.
There are many topics that should be considered:
.hidden/to_delete
with one folder per month.hidden/custom_apps
but we want to keep only the latest version of each ZIM (see https://github.com/openzim/zimfarm/issues/905).hidden/dev
and most of them have no reason to be kept on the long termI would propose to :
The idea of marking files comes from the fact that:
It has some drawbacks:
Proposal of rules (in TOML because it is a config file format for humans and I expect to write the tool in Python which promotes TOML significantly, but in fact I don't really care)
With the following meanings:
[delete_rules.xxx]
: this is the configuration of the deletion rulexxx
(I imagine the tool will be able to do other stuff in the future)folder
: path to process for cleanupdelete_rule
: how to decide what has to be cleanedfile_older_than_days
: delete files older than a given amount of daysall_but_last_book
: delete files which are not the last book version (based on ZIM naming convention)last_folder_older_than_days
: delete folders if they are older than a given amount of days AND the last folder in the tree (i.e. they do not contain another folder)delete_threshold
: the threshold for the deletion ruleforce_delete
: a list of file to force to deleteforce_keep
: a list of files to force to keepI think that this tool will be used for other cleanup duties:
trash_rules
to trash production ZIMsWDYT?