lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.52k stars 266 forks source link

[BUG] Disabled proxy resulting in general `RequestsError` 400 #361

Closed vdusek closed 3 months ago

vdusek commented 3 months ago

Describe the bug

To Reproduce

import asyncio
from curl_cffi.requests import AsyncSession
from proxy import Proxy

async def main():
    session = AsyncSession()

    with Proxy(
        [
            '--hostname',
            '127.0.0.1',
            '--port',
            '8899',
            '--basic-auth',
            'username:password',
            '--disable-http-proxy',
        ]
    ):
        response = await session.request(
            'get',
            'https://httpbin.org/get',
            proxy='http://username:password@127.0.0.1:8899',
        )
        print(f'status_code: {response.status_code}')
        print(f'content: {response.content[:1000].decode()}')

if __name__ == '__main__':
    asyncio.run(main())

Resulting in:

$ python run_curl_cffi_bug.py 
2024-08-01 16:45:33,542 - pid:312119 [I] plugins.load:89 - Loaded plugin proxy.http.proxy.auth.AuthPlugin
Traceback (most recent call last):
  File "/home/vdusek/Projects/crawlee-py/.venv/lib64/python3.12/site-packages/curl_cffi/requests/session.py", line 1264, in request
    await task
curl_cffi.curl.CurlError: Failed to perform, curl: (56) CONNECT tunnel failed, response 400. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vdusek/Projects/crawlee-py/run_curl_cffi_bug.py", line 31, in <module>
    asyncio.run(main())
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/vdusek/Projects/crawlee-py/run_curl_cffi_bug.py", line 21, in main
    response = await session.request(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vdusek/Projects/crawlee-py/.venv/lib64/python3.12/site-packages/curl_cffi/requests/session.py", line 1268, in request
    raise RequestsError(str(e), e.code, rsp) from e
curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (56) CONNECT tunnel failed, response 400. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.

Expected behavior

Example in HTTPX:

import asyncio
from httpx import AsyncClient
from proxy import Proxy

async def main():
    with Proxy(
        [
            '--hostname',
            '127.0.0.1',
            '--port',
            '8899',
            '--basic-auth',
            'username:password',
            '--disable-http-proxy',
        ]
    ):
        async with AsyncClient(proxy='http://username:password@127.0.0.1:8899') as client:
            response = await client.get(url='https://httpbin.org/get')
            print(f'status_code: {response.status_code}')
            print(f'content: {response.read()[:1000].decode()}')

if __name__ == '__main__':
    asyncio.run(main())

Results in:

httpx.ProxyError: 400 BAD REQUEST

Versions

coletdjnz commented 3 months ago

currently, curl-cffi only has one general error for all curl errors. Adding support for mapping curl errors into Python errors would be a nice addition.

For now, the way of detecting certain errors is by using the curl error code. However for this particular error, curl does not appear to distinguish the proxy error by a code so you have to do a string match in addition to code check.

An example from our project: https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp%2Fnetworking%2F_curlcffi.py#L241

perklet commented 3 months ago

Progress is tracked in #250.