Closed stummjr closed 6 years ago
Merging #54 into master will increase coverage by
0.24%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
+ Coverage 93.65% 93.89% +0.24%
==========================================
Files 2 2
Lines 126 131 +5
==========================================
+ Hits 118 123 +5
Misses 8 8
Impacted Files | Coverage Δ | |
---|---|---|
scrapy_crawlera/middleware.py | 93.79% <100%> (+0.25%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4c0418c...1ef7ac5. Read the comment docs.
@stummjr I actually like the idea. But few things to discuss:
DEFAULT_REQUEST_HEADERS
or these ones?@stummjr I also like the idea!
Regarding @eLRuLL questions:
We'll have to give support for all crawlera available headers. Sure!
What should have precedence here, DEFAULT_REQUEST_HEADERS or these ones?
I believe DEFAULT_REQUEST_HEADERS
should have precedence. But we can raise a warning on spider_opened
when we detect a collision.
Should we also give settings-level support? I believe so! That way we could provide better defaults like it was proposed in #52
after checking #52 I think it would be more interesting to have something like DEFAULT_CRAWLERA_HEADERS
, which could actually help only define headers that are part of crawlera.
This doesn't conflict with DEFAULT_REQUEST_HEADERS
as that will only be used if the CrawleraMiddleware
is enabled.
We can also enable that on spider level with the argument of default_crawlera_headers
.
So, sorry but I think the current proposal is not really that necessary. We even had this discussion a while ago and decided to only stay with DEFAULT_REQUEST_HEADERS
, that's why we give emphasis on that on the documentation.
@eLRuLL @baitxo: I like the idea of having DEFAULT_CRAWLERA_HEADERS
separated from DEFAULT_REQUEST_HEADERS
. It is a very good replacement for the spider attributes I proposed here.
Some pros:
custom_settings
;X-Crawlera-*
headers when the plugin isn't enabled (comparing with DEFAULT_REQUEST_HEADERS
);tl;dr: +1 to implement DEFAULT_CRAWLERA_HEADERS
.
This PR fixes #53 by introducind support for Crawlera settings via spider attributes for some common headers, such as:
I didn't add support to the full list of headers, as we are just starting the discussion here.