openzim / nautilus

Turns a collection of documents into a browsable ZIM file
GNU General Public License v3.0
21 stars 14 forks source link
scraper zim

Nautilus

nautilus turns a collection of documents into a browsable ZIM file.

CodeFactor License: GPL v3 codecov PyPI version shields.io PyPI - Python Version Docker

It downloads the video (webm or mp4 format – optionally recompress it in lower-quality, smaller size), the thumbnails, the subtitles and the authors' profile pictures ; then, it creates a static HTML files folder of it before creating a ZIM off of it.

Preparing the archive

To be used with nautilus, your archive should be a ZIP file.

cd content/path
zip -r -0 -T ../content_name.zip *

JSON collection file

Either inside the archive ZIP as /collection.json or elsewhere, specified via --collection mycollection.json, you must supply a JSON file describing your content.

The user-interface only gives access to files referenced properly in the collection.

At the moment, the JSON file needs to provide the following fields for each item in an array:

[
    {
        "title": "...",
        "description": "...",
        "authors": "...",
        "files": ["relative/path/to/file"]
    },
    {
        "title": "...",
        "description": "...",
        "authors": "...",
        "files": [
            {
                "archive-member": "01 BOOK for printing .pdf",  // optional, member name inside archive (same as simpler format)
                "url": "http://books.com/310398120.pdf",  // optional, has precedence over `archive-member`, url to download file from
                "filename": "My book.pdf",  // optional, filename to use in ZIM, regardless of original one
            }
        ]
    }
]

About page

Either inside the archive ZIP as /about.html or elsewhere, specified via --about myabout.html,

Usage

❯ nautiluszim --help
usage: nautiluszim [-h] [-V]

# everything bundled in a ZIP
nautiluszim --archive my-content.zip

# In this mode every file entry must have a valid url.
nautiluszim --collection https://example.com/to-your-collection-file

Installation

You'd want to install it in a dedicated virtual-environment (python3 -m venv some-env && source ./some-env/bin/activate)

❯ pip install nautiluszim

Contributing

❯ pip install -e .

Notes

LANGUAGE=fr nautiluszim --language fra

Nautilus adheres to openZIM's Contribution Guidelines.

Nautilus has implemented openZIM's Python bootstrap, conventions and policies v1.0.0.