openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
35 stars 2 forks source link

re-create all nautilus ZIMs #999

Open rgaudin opened 1 month ago

rgaudin commented 1 month ago

All nautilus ZIMs would benefit from being re-ran:

Now that nautilus supports URL entries, we may want to switch to URL-based collections and drop the ZIP archive. Advantage is that all files are individually available and replaceable ; collections are easy to extend. On the other side, it means it's difficult for one to download a full recipe's data and run nautilus locally. @benoit74 WDYT?

Here's the list of all nautilus recipes (zimfarm has no filter for it)

schedule library archive
laboh laboh_fr http://download.kiwix.org/other/bayard/laboh.zip
maitre_lucas_additions_20 maitre_lucas_additions_20_fr https://drive.farm.openzim.org/maitrelucas/Additions%20et%20soustractions%20jusqu_a%CC%80%2020.zip
maitre_lucas_completer_les_algorithmes maitre_lucas_completer_algorithmes_fr https://drive.farm.openzim.org/maitrelucas/Comple%CC%81ter%20les%20algorithmes.zip
maitre_lucas_completer_les_mots maitre_lucas_completer_les_mots_fr https://drive.farm.openzim.org/maitrelucas/Comple%CC%81ter%20les%20mots.zip
maitre_lucas_compter_audela_de_20 maitre_lucas_compter_audela_de_20_fr https://drive.farm.openzim.org/maitrelucas/Compter%20au%20dela%CC%80%20de%2020.zip
maitre_lucas_compter_jusque_10 maitre_lucas_compter_jusque_10_fr https://drive.farm.openzim.org/maitrelucas/Compter%20jusqu_a%CC%80%2010.zip
maitre_lucas_compter_jusque_5 maitre_lucas_compter_jusque_5_fr https://drive.farm.openzim.org/maitrelucas/Compter%20jusqu_a%CC%80%205.zip
maitre_lucas_confusion_des_lettres maitre_lucas_confusion_des_lettres_fr https://drive.farm.openzim.org/maitrelucas/Confusion%20des%20lettres%20p-q%2C%20b-d.zip
maitre_lucas_developpement maitre_lucas_developpement_fr https://drive.farm.openzim.org/maitrelucas/De%CC%81veloppement%20des%20e%CC%82tres%20vivants.zip
maitre_lucas_enseignement_civique maitre_lucas_enseignement_civique_fr https://drive.farm.openzim.org/maitrelucas/Enseignement%20civique%20et%20moral.zip
maitre_lucas_espace_geometrie maitre_lucas_espace_geometrie_fr https://drive.farm.openzim.org/maitrelucas/Espace%20et%20ge%CC%81ome%CC%81trie.zip
maitre_lucas_labyrinthe_de_calculs maitre_lucas_labyrinthe_de_calculs_fr https://drive.farm.openzim.org/maitrelucas/Labyrinthes%20de%20calculs.zip
maitre_lucas_planete_terre maitre_lucas_planete_terre_fr https://drive.farm.openzim.org/maitrelucas/La%20plane%CC%80te%20Terre%20et%20l_environnement.zip
youscribe_college youscribe_fr_college https://drive.farm.openzim.org/youscribe_college%20/youscribe_college.zip
youscribe_lycee youscribe_fr_lycee https://drive.farm.openzim.org/youscribe_lycee%20/youscribe_lycee.zip
japprendsalire japprendsalire_fr https://drive.farm.openzim.org/japprendsalire/japprendsalire.zip
lesbelleshistoires lesbelleshistoires_fr https://drive.farm.openzim.org/lesbelleshistoires/lesbelleshistoires.zip
mesptitesquestions mesptitesquestions_fr https://drive.farm.openzim.org/mesptitesquestions/mesptitesquestions.zip
experiencesscientifiques experiencesscientifiques_fr https://drive.farm.openzim.org/experiencesscientifiques/experiencesscientifiques.zip
scoopyendirectducorpshumain scoopyendirectducorpshumain_fr https://drive.farm.openzim.org/scoopyendirectducorpshumain/scoopyendirectducorpshumain.zip
diksha-std10ssc-marathi diksha-std10ssc_mr https://drive.farm.openzim.org/zaya/std-10-ssc-marathi.zip
bayardcuisine bayardcuisine_fr https://drive.farm.openzim.org/bayardcuisine/bayardcuisine.zip
pink_pookie Ressources_pedagogiques_relatives_au_droit_auteur https://drive.farm.openzim.org/pink_pookie/pinkpookie.zip
jaimelire jaimelire_fr https://drive.farm.openzim.org/jaimelire/jaimelire.zip
prunelle_draw_your_african_story prunelle_draw_your_african_story_en https://drive.farm.openzim.org/prunelle/draw_your_african_story.zip
prunelle_auteurs_en_herbe prunelle_auteurs_en_herbe_fr https://drive.farm.openzim.org/prunelle/auteurs_en_herbe.zip
maitre_lucas_alimentation maitre_lucas_alimentation_fr https://drive.farm.openzim.org/maitrelucas/Alimentation.zip
maitre_lucas_additions maitre_lucas_additions_fr https://drive.farm.openzim.org/maitrelucas/Additions.zip
maitre_lucas_calcul_mental maitre_lucas_calcul_mental_fr https://drive.farm.openzim.org/maitrelucas/Calcul%20mental.zip
maitre_lucas_conjugaison maitre_lucas_conjugaison_fr https://drive.farm.openzim.org/maitrelucas/Conjugaison.zip
maitre_lucas_dictees maitre_lucas_dictees_fr https://drive.farm.openzim.org/maitrelucas/Dicte%CC%81es.zip
maitre_lucas_divisions maitre_lucas_divisions_fr https://drive.farm.openzim.org/maitrelucas/Divisions.zip
maitre_lucas_double_et_moitie maitre_lucas_double_et_moitie_fr https://drive.farm.openzim.org/maitrelucas/Double%20et%20moitie%CC%81.zip
maitre_lucas_ecriture maitre_lucas_ecriture_fr https://drive.farm.openzim.org/maitrelucas/Ecriture.zip
maitre_lucas_enigmes maitre_lucas_enigmes_fr https://drive.farm.openzim.org/maitrelucas/Enigmes.zip
maitre_lucas_fractions maitre_lucas_fractions_fr https://drive.farm.openzim.org/maitrelucas/fractions.zip
maitre_lucas_grammaire maitre_lucas_grammaire_fr https://drive.farm.openzim.org/maitrelucas/Grammaire.zip
maitre_lucas_lecture maitre_lucas_lecture_fr https://drive.farm.openzim.org/maitrelucas/Lecture.zip
zaya-english-duniya-marathi zaya-english-duniya-marathi_mr https://drive.offspot.it/zaya/zaya-s-english-duniya-marathi.zip
maitre_lucas_compter_par_intervalles maitre_lucas_compter_par_intervalles_fr https://drive.farm.openzim.org/maitrelucas/Compter%20par%20intervalles%20re%CC%81guliers.zip
maitre_lucas_comparer_et_ranger maitre_lucas_comparer_et_ranger_fr https://drive.farm.openzim.org/maitrelucas/Comparer%20et%20ranger.zip
maitre_lucas_grandeur_et_mesures maitre_lucas_grandeur_et_mesures_fr https://drive.farm.openzim.org/maitrelucas/Grandeurs%20et%20mesures.zip
maitre_lucas_dictees_de_nombres maitre_lucas_dictees_de_nombres_fr https://drive.farm.openzim.org/maitrelucas/Dicte%CC%81es%20de%20nombres.zip
maitre_lucas_calculs_et_coloriages maitre_lucas_calculs_et_coloriages_magiques_fr https://drive.farm.openzim.org/maitrelucas/Calculs%20et%20Coloriages%20magiques.zip
prunelle_contes_africains prunelle_contes_africains_fr https://drive.farm.openzim.org/prunelle/contes_africains_a_illustrer.zip
maitre_lucas_corps_humain maitre_lucas_corps_humain_fr https://drive.farm.openzim.org/maitrelucas/Corps%20humain%20et%20activite%CC%81.zip
maitre_lucas_compter_jusque_20 maitre_lucas_compter_jusque_20_fr https://drive.farm.openzim.org/maitrelucas/Compter%20jusqu_a%CC%80%2020.zip
prunelle_interactive_books prunelle_interactive_books_en https://drive.farm.openzim.org/prunelle/prunelle_interactive_books.zip
prunelle_budding_authors Prunelle_budding_authors_en https://drive.farm.openzim.org/prunelle/budding_authors.zip
zimgit-food-preparation_en zimgit-food-preparation_en https://drive.farm.openzim.org/zimgit-food-preparation/zimgit-food-preparation.zip
zimgit-post-disaster_en zimgit-post-disaster_en https://drive.farm.openzim.org/zimgit-post-disaster_en/zimgit.zip
editions-ganndal_fr_fo-livres editions-ganndal_fr_fo-livres https://drive.farm.openzim.org/ganndal/ganndal_2024-03.zip
prunelle_livres_interactifs prunelle_livres_interactifs_fr https://drive.farm.openzim.org/prunelle/livres_interactifs_prunelle.zip
maitre_lucas_apprendre_a_dessiner maitre_lucas_apprendre_a_dessiner_fr https://drive.farm.openzim.org/maitrelucas/Apprendre%20a%CC%80%20dessiner.zip
youscribe_audiobooks youscribe_fr_audiobooks https://drive.farm.openzim.org/youscribe_audiobooks/youscribe_audiobooks.zip
zimgit-knots_en zimgit-knots_en https://drive.farm.openzim.org/zimgit-knots/zimgit-knots.zip
mesptitspourquoi mesptitspourquoi_fr https://drive.farm.openzim.org/mesptitspourquoi/mesptitspourquoi.zip
zimgit-water_en zimgit-water_en https://drive.farm.openzim.org/zimgit-water/zimgit-water.zip
alittlequestionaday alittlequestionaday_en https://drive.farm.openzim.org/alittlequestionaday/alittlequestionaday.zip
diksha-std5ssc-english diksha-std5ssc_en https://drive.farm.openzim.org/zaya/std-5-ssc-english.zip
storybox storybox_en https://drive.farm.openzim.org/storybox/storybox.zip
maitre_lucas_comparer_nombres_100 maitre_lucas_comparer_nombres_100_fr https://drive.farm.openzim.org/maitrelucas/Comparer%20les%20nombres%20jusqu_a%CC%80%20100.zip
Terra_x_de Terra_x_de https://commons.wikimedia.org/wiki/Category:Videos_by_Terra_X
zimgit-medicine_en zimgit-medicine_en https://drive.farm.openzim.org/zimgit-medicine/zimgit-medicine.zip
maitre_lucas_calcul_decimaux maitre_lucas_calcul_decimaux_fr https://drive.farm.openzim.org/maitrelucas/Calculs%20avec%20nombres%20de%CC%81cimaux.zip
disledansmalangue disledansmalangue_fr https://drive.farm.openzim.org/disledansmalangue/disledansmalangue.zip
youscribe_primaire youscribe_fr_primaire https://drive.farm.openzim.org/youscribe_primaire%20/youscribe_primaire.zip
lesptitsphilosophes lesptitsphilosophes_fr http://download.kiwix.org/other/bayard/lesptitsphilosophes.zip
poesies poesies_fr https://drive.farm.openzim.org/poesies/poesies.zip
benoit74 commented 1 month ago

As discussed live, I consider as well that we should indeed expand the Zip on the drive, reencode videos, create a JSON with all individual files URLs, and update the recipe. This is a task for a developer (me probably) since it is too cumbersome / error-prone to do by hand

rgaudin commented 1 month ago

Indeed.

FYI, sample reencode script that can be applied on drive root

import argparse
import logging
import pathlib
import sys

import humanfriendly
from zimscraperlib.video.encoding import reencode
from zimscraperlib.video.presets import VideoWebmLow

logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)
ROOT = pathlib.Path(__file__).parent

def disk_usage(folder):
    return sum(file.stat().st_size for file in folder.glob("**/*"))

def hsize(size):
    return humanfriendly.format_size(size, binary=True)

def main(root: pathlib.Path):

    du = disk_usage(root)
    logger.info(f"re-encoding videos from {root} ({hsize(du)})")

    ffmpeg_args = VideoWebmLow().to_ffmpeg_args()

    errored = []
    for video_fpath in root.rglob("*.webm"):
        logger.info(f"** {video_fpath}")
        if reencode(
            src_path=video_fpath,
            dst_path=video_fpath,
            ffmpeg_args=ffmpeg_args,
            delete_src=True,
            with_process=False,
            failsafe=True,
        ):
            logger.info("  OK")
        else:
            logger.error("  ERROR")
            errored.append(video_fpath)

    final_du = disk_usage(root)
    logger.info(f"new disk-usage: {hsize(final_du)} (diff: {hsize(final_du - du)})")

    if not errored:
        logger.info("ALL OK")
        return

    logger.error(f"{len(errored)} files failed to re-encode:\n- "+ "\n- ".join(errored))

def entrypoint():
    parser = argparse.ArgumentParser(
        prog="re-encode",
        description="re-encode videos using scraperlib",
    )

    parser.add_argument(
        help="Source file path",
        dest="src_path",
    )

    args = parser.parse_args()

    try:
        sys.exit(main(pathlib.Path(args.src_path).expanduser().resolve()))
    except Exception as exc:
        logger.error(f"FAILED. An error occurred: {exc}")
        logger.exception(exc)
        raise SystemExit(1) from exc

if __name__ == "__main__":
    entrypoint()
kelson42 commented 1 month ago

Can we please just do that (redoing the ZIM files) programmaticaly? This is a priority. The rest should be handled separatly and I‘m not in favour of rewritting the ZIP except if really necessary, see for example https://github.com/openzim/nautilus/issues/23

rgaudin commented 1 month ago

Following live discussion:

It is understood we'll reencode those because we know that those are broken webm files and because we don't have the source videos anymore. In a normal situation, we'll store the source video on the drive and the (yet to be implemented) nautilus-included encoder will optimize it.

kelson42 commented 1 month ago

I have rescheduled all 47 recipes based on the „nautilus“ tag and after fixing a few ones (Nautilus 1.2 has a better Metadata conformity check), they have all passed