lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.49k stars 265 forks source link

Support response.raw #438

Open xyb opened 3 days ago

xyb commented 3 days ago

Is your feature request related to a problem? Please describe. While developing a plugin for HTTPie, I noticed that the requests library interface lacked access to the response.raw attribute, which HTTPie requires.

Describe the solution you'd like

>>> from curl_cffi.requests import Session
>>> with Session(impersonate="chrome") as session:
...   response = session.get("https://httpbin.org/get")
...   print(response.raw)
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
AttributeError: 'Response' object has no attribute 'raw'

>>> from requests import Session
>>> with Session() as session:
...   response = session.get("https://httpbin.org/get")
...   print(response.raw)
...
<urllib3.response.HTTPResponse object at 0x103a09c30>

Describe alternatives you've considered None

Additional context None

xyb commented 3 days ago

I’ve uploaded my plugin to demonstrate how to reproduce the issue:

❯ httpie --debug plugins install httpie-curl-cffi

❯ http --debug https://httpbin.org/get
...
http: error: AttributeError: 'Response' object has no attribute 'raw'

Traceback (most recent call last):
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/bin/http", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/__main__.py", line 9, in main
    exit_status = main()
                  ^^^^^^
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/core.py", line 162, in main
    return raw_main(
           ^^^^^^^^^
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/core.py", line 140, in raw_main
    handle_generic_error(e)
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/core.py", line 100, in raw_main
    exit_status = main_program(
                  ^^^^^^^^^^^^^
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/core.py", line 213, in program
    for message in messages:
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/httpie/client.py", line 114, in collect_messages
    response = requests_session.send(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xyb/.virtualenvs/httpie-curl-cffi/lib/python3.11/site-packages/requests/sessions.py", line 718, in send
    extract_cookies_to_jar(self.cookies, request, r.raw)
                                                  ^^^^^
AttributeError: 'Response' object has no attribute 'raw'
lexiforest commented 3 days ago

Unfortunately, it's not possible to implement this attribute. curl/libcurl will automatically unzip the response in the streaming callback no matter what, whereas response.raw should return the streaming content of the raw i.e. compressed content.

lexiforest commented 3 days ago

It may be possible with curl_easy_recv, but that would take a significant amount of work, let's keep this open and revisit this in the future.

xyb commented 3 days ago

Unfortunately, it's not possible to implement this attribute. curl/libcurl will automatically unzip the response in the streaming callback no matter what, whereas response.raw should return the streaming content of the raw i.e. compressed content.

As shown in the previous example, response.raw is an instance of <urllib3.response.HTTPResponse object at 0x103a09c30>. However, implementing response.raw directly may not be the best solution. It would be more effective to trace the call stack and find the most appropriate entry point for addressing the issue. Unfortunately, this approach requires a deeper dive into the requests library, which I am unable to dedicate time to at the moment.

lexiforest commented 3 days ago

Hi, I just took another look at your stacktrace, it seems that what is missing here is requests.Session.send(), not response.raw(), now the problem is simpler to solve.

vevv commented 2 days ago

Here's a wrapper I use to add raw-like reading functionality. I only ever need content that's either decompressed or not compressed to begin with, so this works well for me. Though even with requests I've never ran into compressed content.

class RawReader:
    def __init__(self, response: requests.Response):
        self.response = response

    def read(self, amt: int | None = None):
        data = b""
        for chunk in self.response.iter_content():
            data += chunk
            if amt and len(data) >= amt:
                break

        return data

 # resp.raw = RawReader(resp)