Closed VMRuiz closed 5 years ago
I like this, to be honest I thought that this was already in place. Since there are quite a lot of 503's that are not bans (https://doc.scrapinghub.com/crawlera.html#errors)
Thanks, could you please fix the broken tests so I can merge it and schedule a release.
This is great for the Crawlera middleware, but I wonder how important is this semantic and whether most (non-Scrapy) Crawlera customers are aware of it?. I don't see any mention on the Best Practices, should there be?
I was under the impression that 503's were still the main way for Crawlera to signal domains banned / no proxies.
@pablohoffman scrapy-crawlera doesn't do retries. It checks for bans for stats and for killing the spider should too many bans occur.
We have these 503's
client_conn_closed | 503 | Connection closed by client
no_proxies | 503 | No available proxies
banned | 503 | Proxy has been banned
serverbusy | 503 | Server busy: too many outstanding requests
Only the banned
one is actually a ban. So I completely agree with changes proposed by @VMRuiz. Regardless of anything all 503's should be retried like it says on the Best Practices.
This is a fix for #39
Merging #64 into master will increase coverage by
0.07%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 94.59% 94.66% +0.07%
==========================================
Files 2 2
Lines 148 150 +2
==========================================
+ Hits 140 142 +2
Misses 8 8
Impacted Files | Coverage Δ | |
---|---|---|
scrapy_crawlera/middleware.py | 94.59% <100%> (+0.07%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 28b38e4...e8b2ee9. Read the comment docs.
Merging #64 into master will increase coverage by
0.07%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 94.59% 94.66% +0.07%
==========================================
Files 2 2
Lines 148 150 +2
==========================================
+ Hits 140 142 +2
Misses 8 8
Impacted Files | Coverage Δ | |
---|---|---|
scrapy_crawlera/middleware.py | 94.59% <100%> (+0.07%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 28b38e4...e8b2ee9. Read the comment docs.
Checking only the status code is not enough to know if the response coming from crawlera is a ban or not. I have added extra checks to prevent false positives.