howie6879 / ruia

Async Python 3.6+ web scraping micro-framework based on asyncio
https://www.howie6879.com/ruia/
Apache License 2.0
1.75k stars 181 forks source link
aiohttp asyncio asyncio-spider crawler crawling-framework middlewares python python-ruia ruia spider uvloop

Ruia logo

Ruia

🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio.

⚡ Write less, run faster.

travis codecov PyPI - Python Version PyPI Downloads gitter

Overview

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Write less, run faster:

Features

Installation

# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia

Tutorials

  1. Overview
  2. Installation
  3. Define Data Items
  4. Spider Control
  5. Request & Response
  6. Customize Middleware
  7. Write a Plugins

TODO

Contribution

Ruia is still under developing, feel free to open issues and pull requests:

!!!Notice: We use black to format the code.

Thanks