jxlil / scrapy-impersonate

Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.
MIT License
104 stars 11 forks source link

Proxy Authentication failure #18

Closed oussamadz closed 2 months ago

oussamadz commented 2 months ago

Describe the bug when using a proxy endpoint with authentication in middleware, for some reason i get response 407 : curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (56) CONNECT tunnel failed, response 407. See https://curl.se/libcurl/c/libcurl-errors.html first for more details. but the same authentication works flawlessly with curl_cffi requests when passing it as dict.

EDIT: I also tried passing auth as basic header: Proxy-Authorization: Basic [base64 user:pass] and same results came (above error)

To Reproduce Steps to reproduce the behavior:

  1. Go to 'middlewares.py'
  2. in downloader middleware > process_request add proxy string: request.meta['proxy'] = "http://[username]:[password]@[host]:[port]"
  3. add the middleware in settings

Expected behavior To use the proxy endpoint and bypass any 429 codes

dream2333 commented 2 months ago

17 Fixed

oussamadz commented 2 months ago

Still same thing but now gives 407 and 429, probably not using proxy when retry so it just gives 429 and not curl 407

oussamadz commented 2 months ago
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks
    result = context.run(
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/tmp/tester/.venv/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1251, in adapt
    extracted: _SelfResultT | Failure = result.result()
  File "/tmp/tester/.venv/lib/python3.10/site-packages/scrapy_impersonate-1.3.1-py3.10.egg/scrapy_impersonate/handler.py", line 46, in _download_request
  File "/tmp/tester/.venv/lib/python3.10/site-packages/curl_cffi-0.7.1-py3.10-linux-x86_64.egg/curl_cffi/requests/session.py", line 1268, in request
    raise RequestsError(str(e), e.code, rsp) from e                                                                                                                                        
curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (56) CONNECT tunnel failed, response 407. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.
dream2333 commented 2 months ago
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks
    result = context.run(
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/tmp/tester/.venv/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/tmp/tester/.venv/lib/python3.10/site-packages/twisted/internet/defer.py", line 1251, in adapt
    extracted: _SelfResultT | Failure = result.result()
  File "/tmp/tester/.venv/lib/python3.10/site-packages/scrapy_impersonate-1.3.1-py3.10.egg/scrapy_impersonate/handler.py", line 46, in _download_request
  File "/tmp/tester/.venv/lib/python3.10/site-packages/curl_cffi-0.7.1-py3.10-linux-x86_64.egg/curl_cffi/requests/session.py", line 1268, in request
    raise RequestsError(str(e), e.code, rsp) from e                                                                                                                                        
curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (56) CONNECT tunnel failed, response 407. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.

Are you still using the 1.3.1 version of the package from PyPI? The new version has not been released to PyPI yet. Please pull the master branch directly from Git.

oussamadz commented 2 months ago

Installed from merged git repo. The password is correct, i understand that some proxies can be detected and therefore return 429 but there shouldn't be 407 status at all

On Fri, Aug 9, 2024, 02:38 Dream @.***> wrote:

Still same thing but now gives 407 and 429, probably not using proxy when retry so it just gives 429 and not curl 407

HTTP 407 indicates that the proxy authentication you are using has failed. Please check if the password is correct.

HTTP 429 indicates that your proxy requests are too frequent.

— Reply to this email directly, view it on GitHub https://github.com/jxlil/scrapy-impersonate/issues/18#issuecomment-2276991542, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQTUFBLPFRRSWGYFHA52WLZQQMSBAVCNFSM6AAAAABMFJK4CSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWHE4TCNJUGI . You are receiving this because you modified the open/close state.Message ID: @.***>

dream2333 commented 2 months ago

Could you please provide a minimal reproducible example? I couldn't reproduce your issue on my end.

Installed from merged git repo. The password is correct, i understand that some proxies can be detected and therefore return 429 but there shouldn't be 407 status at all On Fri, Aug 9, 2024, 02:38 Dream @.> wrote: Still same thing but now gives 407 and 429, probably not using proxy when retry so it just gives 429 and not curl 407 HTTP 407 indicates that the proxy authentication you are using has failed. Please check if the password is correct. HTTP 429 indicates that your proxy requests are too frequent. — Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQTUFBLPFRRSWGYFHA52WLZQQMSBAVCNFSM6AAAAABMFJK4CSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWHE4TCNJUGI . You are receiving this because you modified the open/close state.Message ID: @.>

oussamadz commented 2 months ago

here is my start_requests method:

def start_requests(self):
        yield scrapy.Request(
            url=self.start_urls[0],
            headers={"Proxy-Authorization": "Basic [base64 user:pass]"},
            meta={'impersonate': 'chrome110',
                "proxy": "http://[proxy url]:[port]"}        
)
jxlil commented 2 months ago

Hi @oussamadz, please try this with version 1.4.0

Or you can also try to authenticate the proxy as follows:

def start_requests(self):
    yield Request(
        url=self.start_urls[0],
        meta={
            "proxy": "http://[proxy url]:[proxy port]",
            "impersonate": "chrome110",
            "impersonate_args": {
                "proxy_auth": ("[user]", "[pass]"),
            },
        },
    )
oussamadz commented 2 months ago

it works, thank you. also when passing request.meta['proxy'] = "http://[user]:[pass]@[host]:[port]" in downloader middleware parse_request seemed to work (with @dream2333 merged 1.3.1)