book_crawler/
scrapy.cfg <-- Configuration file (DO NOT TOUCH!)
tutorial/
__init__.py <-- Empty file that marks this as a Python folder
items.py <-- Model of the item to scrap
middlewares.py <-- Scrapy processing hooks (DO NOT TOUCH)
pipelines.py <-- What to do with the scraped item
settings.py <-- Project settings file
spiders/ <-- Directory of our spiders (empty by now)
__init__.py
python environment = python interpreter + installed packages
pipenv详解
准备(pipenv, 这东西性能太差,已放弃,重回venv)
新建项目
scrapy startproject book_crawler
新建爬虫
scrapy genspider fiction books.toscrape.com
运行
scrapy crawl fiction
保存
在控制台调式
extract 相当于querySelectorAll, 返回list extract_first相当于querySelector, 返回第一个匹配的元素
Rererences