lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.34k stars 255 forks source link

[BUG] Setting CookieMorsel `subdomains` from Cookie `domain_initial_dot` instead of Cookie `domain_specified` #348

Closed coletdjnz closed 3 months ago

coletdjnz commented 3 months ago

Describe the bug

The subdomains attribute of a CurlMorsel is currently being set to the value of Cookie.domain_initial_dot, however I don't believe this is correct. Instead, this should be set from the value of Cookie.domain_specified.

https://github.com/yifeikong/curl_cffi/blob/630a4dcfc24a73ace0e71d364aae1457cebc3fc0/curl_cffi/requests/cookies.py#L85-L95

From my understanding of the standards (RFC6265), if the Domain attribute is not present in the Set-Cookie header, then the cookie domain is the request domain and must only be applied to that domain, and not subdomains (subdomains=False)

If the Domain attribute is present then the domain matches that domain plus any subdomains (subdomains=True)

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#domaindomain-value

Setting the domain will make the cookie available to it, as well as to all its subdomains. If omitted, this attribute defaults to the host of the current document URL, not including subdomains.

The Cookie.domain_specified is used to distinguish this behaviour.

https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.Cookie.domain_specified

Cookie.domain_specified True if a domain was explicitly specified by the server.

Additionally, the parsing of CurlMorsel to Python Cookie also suffers from the same issue in reverse. domain_specified should be mapped to subdomains

https://github.com/yifeikong/curl_cffi/blob/630a4dcfc24a73ace0e71d364aae1457cebc3fc0/curl_cffi/requests/cookies.py#L97-L120

This is how python sets it from a response: https://github.com/python/cpython/blob/d6555abfa7384b5a40435a11bdd2aa6bbf8f5cfc/Lib/http/cookiejar.py#L1526

Whether the Domain contains an initial dot is irrelevant (at least in RFC6256), so domain_initial_dot should be ignored.

For more confidence, curl also maps the second column in a cookiefile to subdomains, of which in Python's version it maps to domain_specified https://curl.se/docs/http-cookies.html#cookie-file-format https://github.com/python/cpython/blob/d6555abfa7384b5a40435a11bdd2aa6bbf8f5cfc/Lib/http/cookiejar.py#L2044

To Reproduce

This is causing an issue on our end where a request made with Urllib to subdomain.example.com gets a response with Set-Cookie: Domain=example.com (domain_initial_dot=False, domain_specified=True). The cookiejar is shared with curl_cffi, of which when a request is subsequently made to subdomain2.example.com with curl_cffi, the cookie is not added because curl_cffi has read the cookie incorrectly and set subdomains=False.

Versions

Additional context https://github.com/yt-dlp/yt-dlp/issues/10438 https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.2.3 https://datatracker.ietf.org/doc/html/rfc6265#section-5.3

perklet commented 3 months ago

Thanks for the detailed investigation. If I understand correctly, are these the changes we need?

- subdomains=cookie.domain_initial_dot,
+ subdomains=cookie.domain_specified,

- domain_specified=True,
+ domain_specified=self.subdomains,
coletdjnz commented 3 months ago

Thanks for the detailed investigation. If I understand correctly, are these the changes we need?

- subdomains=cookie.domain_initial_dot,
+ subdomains=cookie.domain_specified,

- domain_specified=True,
+ domain_specified=self.subdomains,

I believe so

perklet commented 3 months ago

Fixed in 0.7.1.