scrapinghub / scrapy-poet

Page Object pattern for Scrapy
BSD 3-Clause "New" or "Revised" License
119 stars 28 forks source link

=========== scrapy-poet

.. image:: https://img.shields.io/pypi/v/scrapy-poet.svg :target: https://pypi.python.org/pypi/scrapy-poet :alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/scrapy-poet.svg :target: https://pypi.python.org/pypi/scrapy-poet :alt: Supported Python Versions

.. image:: https://github.com/scrapinghub/scrapy-poet/workflows/tox/badge.svg :target: https://github.com/scrapinghub/scrapy-poet/actions :alt: Build Status

.. image:: https://codecov.io/github/scrapinghub/scrapy-poet/coverage.svg?branch=master :target: https://codecov.io/gh/scrapinghub/scrapy-poet :alt: Coverage report

.. image:: https://readthedocs.org/projects/scrapy-poet/badge/?version=stable :target: https://scrapy-poet.readthedocs.io/en/stable/?badge=stable :alt: Documentation Status

scrapy-poet is the web-poet_ Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation <https://scrapy-poet.readthedocs.io>_ for more information.

License is BSD 3-clause.

.. _web-poet: https://github.com/scrapinghub/web-poet

Quick Start


Installation

.. code-block::

pip install scrapy-poet

Requires Python 3.9+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy's settings.py file:

.. code-block:: python

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt
  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

You can also directly invoke pre-commit run --all-files or tox -e linters to run them without performing a commit.