Blacklist domains - Githubissues

scrapy-plugins / scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

BSD 3-Clause "New" or "Revised" License

356 stars 88 forks source link

Blacklist domains #94

Open whalebot-helmsman opened 3 years ago

whalebot-helmsman commented 3 years ago

I was setuping autoextract in scrapy cloud on a project with crawlera addon. Autoextract queries were routed through crawlera. Idea is to blacklist autoextract domain by default. It may have sense for other services, e.g. spalsh.

It is possible to implement this without adding new options, e.g. adding something to https://github.com/scrapy-plugins/scrapy-crawlera/blob/019987f68345079db176405c9f9fbb155ee26f20/scrapy_crawlera/middleware.py#L32

Gallaecio commented 3 years ago

I would also log a warning for the first time it happens during a crawl.