Closed hcoura closed 5 years ago
Merging #70 into master will increase coverage by
0.59%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #70 +/- ##
==========================================
+ Coverage 94.66% 95.26% +0.59%
==========================================
Files 2 3 +1
Lines 150 169 +19
==========================================
+ Hits 142 161 +19
Misses 8 8
Impacted Files | Coverage Δ | |
---|---|---|
scrapy_crawlera/middleware.py | 94.93% <100%> (+0.34%) |
:arrow_up: |
scrapy_crawlera/utils.py | 100% <100%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e8554a5...673a14f. Read the comment docs.
We have exponential backoff implemented for proxy checks in https://github.com/TeamHG-Memex/scrapy-rotating-proxies. It uses jitter, following analysis in https://aws.amazon.com/ru/blogs/architecture/exponential-backoff-and-jitter/.
No strong opinion from me on large delay vs exponential back-off though :) Backoff looks fine, because with exponential backoff + jitter you should be recovering from "no proxy" errors faster after an error is resolved on Crawlera side.
I think this is the deal exponential backoff + jitter you should be recovering from "no proxy" errors faster after an error is resolved on Crawlera side.
if no one has strong opinions about it I should do a final implementation next week.
Is this error different from noslaves
? If so, shouldn't we includenoslaves
too?
Oh Thanks @jesuslosada the doc on https://doc.scrapinghub.com/crawlera.html is actually outdated, source shows the error is noslaves
, will update the PR and source accordingly.
I like it. Just two things:
Recent internal problems when provisioner restart causing spiders to stop working due to been considered bans were fixed on https://github.com/scrapy-plugins/scrapy-crawlera/pull/64 but another issue is that, at the same time scrapy behavior will be to retry with no delay until proxies are available.
This is supposed to help with that by introducing exponential backoff should it encounter
No Available Proxies
, now that I am writing this I am not sure it makes sense, or if we just rather have a bigish delay when that happens? Please opinions.Also if going with this, I would like opinions on what should be the default values and if we should actually put some as settings.