Closed tomasrinke closed 5 years ago
It would be interesting but right now Scrapy does not report on the status phrase: https://github.com/scrapy/scrapy/blob/f01ae6ffcd431b73f5358f9f876f8e9ee9be0113/scrapy/core/downloader/handlers/http11.py#L360 so the info accompanying the 503 is not available at the middleware level.
my bad. X-Crawlera-Error
header does have some information.
so the info accompanying the 503 is not available at the middleware level.
IIRC it's available via the X-Crawlera-Error
header
As seen here: https://doc.scrapinghub.com/crawlera.html#errors
503 could mean multiple errors, not just a ban:
scrapy-crawlera only checks for the code, and could be misleading.
IMHO it should consider the message of the response as well: HTTP code 503 and "Proxy has been banned"
I discovered that this is the output of scrapy:
and Crawlera stats show only 15 errors with 503 and "Proxy has been banned" which matches this count
'crawlera/response/error/banned'