openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
20 stars 18 forks source link
library python webscraping zim

zimscraperlib

Build Status CodeFactor License: GPL v3 PyPI version shields.io PyPI - Python Version codecov

Collection of python code to re-use across python-based scrapers

Usage

Example usage:

zimscraperlib>=1.1,<1.2

See functional architecture, software architecture and technical architecture for more details on scraperlib (not all aspects are covered yet, this is a WIP).

Dependencies

macOS

brew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle

Linux

sudo apt install libmagic1 wget ffmpeg \
    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle

Alpine

apk add ffmpeg gifsicle libmagic wget libjpeg

Contribution

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.2.

pip install hatch
pip install ".[dev]"
pre-commit install
# For tests
invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):