Closed kelson42 closed 5 years ago
@Popolechien I think I have a ticket for that somewhere... but we need definitly to setup an quality insurance system. The idea is to add this validation step one time the files are uploaded to the warehouse.
@automactic The docker itself should be really is: monitor a directory, check the new files with zim-check, if returns no error, then move the file to make it really available to download. Otherwise "to be defined".
@kelson42, the zimfarm warehouse is a SFTP server. It cannot do stuff like monitor dir and test new files.
What might be a good idea is to introduce the concept of staging. SFTP server move files from workers to staging, then a dedicated monitor will kick off testing jobs for new files in staging. After tests passed, move them to production.
@kelson42 How are we planning to test zim files? For situation like above, there doesn't seems to be an obvious way to automate the test
@automactic We'll need to have a human step in there. For these wikis I'm also thinking of contacting the mods to ask if they'd be ok with us having a simplified landing page (like we already do on a few Wikipedia).
@Popolechien It seems to me quite unrealistic, because of human resource bottleneck, to have a human review of many thousands of new ZIM a month. On the top of that, this is something which can be automated, so for us, probably something we could/should do
@automactic I do not have talked about the "warehouse" container. IMO the warehouse container is fine to receipt the ZIM files from the distributed workers. Just take care that we have a way to easily know if a file is fully uploaded or not on the fs. We need that because one time a file will be uploaded, the "sanity check" container (still to build) will run zim-check
against that file and then move it to final destination. To conclude the warehouse and the sanity-check container will share a Docker volume.
@kelson42 Of course not every ZIM, but just the new ones for their very first deployment. I don't know how many new contents we publish yearly, but I'd be surprised at this stage that it's more than a handful.
@Popolechien OK, then I do that already.
This will be handled outside the zimfarm project. See https://github.com/kiwix/maintenance/issues/30
From @Popolechien on November 4, 2018 9:20
I'm looking at http://library.kiwix.org/granbluefantasy_en_all_all_nopic_2018-10/ (the last release of Granblue fantasy wiki) and it is obvious that a bunch of things are broken, rendering the file unusable and a waste of data/download time for users. It seems to be a rather recent addition to the library, so can we think of some simple confirm/vetting process (a.k.a. Quality control) before adding new zims?
Copied from original issue: openzim/mwoffliner#422