scrapy-plugins / scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
BSD 3-Clause "New" or "Revised" License
357 stars 88 forks source link

Failed to access response.meta when receiving Crawlera auth error (407) #91

Closed heylouiz closed 3 years ago

heylouiz commented 3 years ago

I am getting AttributeError: Response.meta not available, this response is not tied to any request when receiving Crawlera auth error responses.

The problem occurs in this line: https://github.com/scrapy-plugins/scrapy-crawlera/blob/master/scrapy_crawlera/middleware.py#L204

Here is the error trace:

Traceback (most recent call last):
  File "/app/python/lib/python3.8/site-packages/scrapy/http/response/__init__.py", line 44, in meta
    return self.request.meta
AttributeError: 'NoneType' object has no attribute 'meta'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/app/python/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_response
    response = yield deferred_from_coro(method(request=request, response=response, spider=spider))
  File "/usr/local/lib/python3.8/site-packages/scrapy_crawlera/middleware.py", line 204, in process_response
    retries = response.meta.get('crawlera_auth_retry_times', 0)
  File "/app/python/lib/python3.8/site-packages/scrapy/http/response/__init__.py", line 46, in meta
    raise AttributeError(
AttributeError: Response.meta not available, this response is not tied to any request

How I've tested:

I've started a flask server in a VPS with the following code:

from flask import Flask                                                                                                                                                                                                                                                                                                                                                                                                                   
app = Flask(__name__)                                                                                                                                                                                                                                                                                                                                                                                                                     
@app.route('/')                                                                                                                                                                                                                                                                                                                                                                                                                           
def hello_world():                                                                                                                                                                                                                                                                                                                                                                                                                        
    return 'Hello, World!', 407, {'X-Crawlera-Error': b"bad_proxy_auth"}

Then I created a virtualenv, installed scrapy_crawlera and ran the following spider:

import scrapy
class TestCrawleraSpider(scrapy.Spider):
    name = "test_crawlera"
    start_urls = ["MY_VPS_IP:5000"]
    custom_settings = {
        "DEFAULT_REQUEST_HEADERS": {
            "X-Crawlera-No-Bancheck": 1, # I think I need to do this so crawlera does not retry the request
        },
        "DOWNLOADER_MIDDLEWARES": {
            'scrapy_crawlera.CrawleraMiddleware': 610,
        },
        "CRAWLERA_ENABLED": True,
        "CRAWLERA_APIKEY": "MYAPIKEY",
    }
    def parse(self, response):
        print(response.text)

There is nothing else configured, I've run with the command: scrapy runspider testspider.py

Versions used in this test:

Scrapy==2.4.0
scrapy-crawlera==1.7.1
starrify commented 3 years ago

Hi @heylouiz much appreciated for the issue report! It is assumed that the issue comes from some mistakes in the implement. I have proposed a pull request for fixing that (link above).

Please wait for some time before the maintainers review and merge that PR. In case you need an immediate fix in your project, you might temporarily use that branch in the PR (pip install git+https://github.com/org-name/project-name.git@branch-name) as a workaround.

noviluni commented 3 years ago

Today I observed this in one project I'm working on. If it becomes recurrent I will point to @starrify temporarily solution.