lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.54k stars 269 forks source link

[Feature] Remove big overhead in BaseSession cookies section #285

Closed deedy5 closed 7 months ago

deedy5 commented 8 months ago

There is a possibility to remove the big overhead and make cookie setting a single call to c.setopt()

BaseSession()._set_curl_options(): ...

Before:

        c.setopt(CurlOpt.COOKIEFILE, b"")  # always enable the curl cookie engine first
        c.setopt(CurlOpt.COOKIELIST, "ALL")  # remove all the old cookies first.

        for morsel in self.cookies.get_cookies_for_curl(req):
            # print("Setting", morsel.to_curl_format())
            curl.setopt(CurlOpt.COOKIELIST, morsel.to_curl_format())
        if cookies:
            temp_cookies = Cookies(cookies)
            for morsel in temp_cookies.get_cookies_for_curl(req):
                curl.setopt(CurlOpt.COOKIELIST, morsel.to_curl_format())

Sample code after the improvement:

from http.cookies import SimpleCookie

        cookie_obj = SimpleCookie()
        for name, value in self.cookies.__dict__.items():
            cookie_obj[name] = value
        cookie_header = cookie_obj.output(header="", sep="; ")
        c.setopt(CurlOpt.HTTPHEADER, [f"Cookie: {cookie_header}".encode])
perklet commented 8 months ago

That's the solution I went for before v0.5.8, but it does not work well for many edge cases. See the lengthy discussion in #55.

To summarize it, we have to activate libcurl's cookie engine to make it work.

To mimick the requests' interface with cookiejar, the cookies have to be synced between libcurl and python. Another way is simply interact with libcurl's cookie engine, which will require almost no overhead.