scrapy-plugins / scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
BSD 3-Clause "New" or "Revised" License
356 stars 88 forks source link

Spider attributes for the most common headers #53

Closed stummjr closed 6 years ago

stummjr commented 6 years ago

Headers such as X-Crawlera-Profile, X-Crawlera-Debug and X-Crawlera-Cookies are quite often used by Crawlera users. As such it could be easier if this plugin offered builtin support for them as spider attributes, such as:

class FooSpider(scrapy.Spider):
    name = 'foo'
    crawlera_profile = 'mobile'
    crawlera_debug = 'ua'
    ...

Without them, users always have to write their custom Crawlera middlewares or, even worse, pass the headers explicitly in each and every request issued by the spiders.

A quite common use case: you want to quickly debug some Crawlera settings such as user agents, response times, etc. To do that, you have to build a custom middleware for that or modify your spiders in every place where they generate requests. The attributes make it more convenient, as it'd be just a matter of setting crawlera_debug = 'ua'.

What do you think about adding support to such attributes in a spider level?

stummjr commented 6 years ago

There's an ongoing discussion on https://github.com/scrapy-plugins/scrapy-crawlera/pull/54