openzim / ifixit

iFixit to ZIM scraper
GNU General Public License v3.0
25 stars 3 forks source link
ifixit kiwix openzim scraper zim

iFixit

ifixit2zim is an openZIM scraper to create offline versions of iFixit website, in all its supported languages.

CodeFactor License: GPL v3 codecov PyPI version shields.io PyPI - Python Version

This scraper downloads the iFixit resources (categories, guides, ...) and puts them in a ZIM file, a clean and user friendly format for storing content for offline usage.

Usage

ifixit2zim works off a language version that you must provide via the --language argument. The list of supported languages is visible in the --help message.

Docker

docker run -v my_dir:/output ghcr.io/openzim/ifixit ifixit2zim --help

Python

ifixit2zim is a Python3 (3.6+) software. If you are not using the Docker image, you are advised to use it in a virtual environment to avoid installing software dependencies on your system. In addition to Python3, you also need to have an up-to-date installation of pip, setuptools and wheel as recommanded here (wheel is important since you will have to build some dependencies).

python3 -m venv .venv
source .venv/bin/activate

# using published version
pip3 install ifixit2zim
ifixit2zim --help

# running from source
pip3 install -r requirements.pip
python3 ifixit2zim/ --help

Call deactivate to quit the virtual environment.

See requirements.txt for the list of python dependencies.

Contributing

All contributions are welcome!

Please open an issue on Github and/or submit a Pull-request.

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.0.

Guidelines

Create an appropriate Python environment

First time:

python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.pip

Next times:

source .venv/bin/activate

NOTA : there is some limitations to the execution of the underlying libzim library on MacOS with some known bugs. The main issue is that the full-text index is not working, so this shouldn't be a problem for quick tests. In doubt, execute the scraper in a Docker container as explained below.

Test the scraper in a Docker container

First, build the Docker image (to be ran in the main folder of this repo):

docker build -t local-ifixit .

Then run the scraper with CLI arguments needed for your test (everything after ifixit2zim in the example below).

For instance, if you want to run a scrape of only the Apple_PDA category, including its guides, in French :

docker run -it -v $(pwd)/output:/output --rm local-ifixit ifixit2zim --language fr --output /output --tmp-dir /tmp --category Apple_PDA

This will produce a ZIM in the output folder of your current directory.

Test the ZIM produced

To test if the ZIM produced is OK, you should run kiwix-serve, once more with Docker.

For instance, if you produced a file named ifixit_fr_selection_2022-04.zim in the output subfolder, and port 1256 is unused on your machine, you might run:

docker run -it --rm -v $(pwd)/output:/data -p 1256:80 ghcr.io/kiwix/kiwix-tools kiwix-serve /data/ifixit_fr_selection_2022-04.zim

And then navigate to (https://localhost:1256) on your favorite browser.

Once test are complete, you might stop the Docker container by pressing Ctrl-C