lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.51k stars 266 forks source link

Cookies starting with quotation marks get mangled #414

Open vevv opened 1 month ago

vevv commented 1 month ago

v0.7.3: Cookies which start with quotation marks, e.g. "value", get stripped and are sent as value.

lexiforest commented 1 month ago

If I understand correctly, this is what the RFC states:

cookie-pair       = cookie-name "=" cookie-value
cookie-name       = token
cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

DQUOTE(") is not a valid cookie value, actually. If your target does not follow the RFC, you have to find out how it escapes the quotes and follow the same pattern.

vevv commented 1 month ago

Well, I don't really know what to tell you. If I look at the headers requests is sending, they have quotation marks at the start/front, as do the values shown in browser dev tools.

I found this while investigating a larger issue where a certain POST request just wouldn't go through, and comparing curl_cffi and requests in Fiddler, this was the only difference.

perklet commented 1 month ago

Does your cookie value happen to contain spaces, semicolons or any other characters that mentioned above?

Again, please follow the issue template and include a reproducible snippet, otherwise it wastes both your and my time in this Q&A style discussion.

来自手机回复

vevv @.***>于2024年10月22日 周二22:54写道:

Well, I don't really know what to tell you. If I look at the headers requests is sending, they have quotation marks at the start/front, as do the values shown in browser dev tools.

I found this while investigating a larger issue where a certain POST request just wouldn't go through, and comparing curl_cffi and requests in Fiddler, this was the only difference.

— Reply to this email directly, view it on GitHub https://github.com/lexiforest/curl_cffi/issues/414#issuecomment-2429512158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH4ZX5DYBRUYBMGQUWJHFTZ4ZRIRAVCNFSM6AAAAABQMTJVCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZGUYTEMJVHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vevv commented 1 month ago

I can't easily provide any example snippets, it's authenticating with a service. I can confirm that adding the quotes in Fiddler to the curl request makes it work, it's the only difference.

And I apologize for not following the template, but it's the third time today where I ran into an issue I can't easily debug (because I do not control the remote server), where it's curl_cffi silently modifying requests in a way that prevents them from working, it's extremely frustrating.

vevv commented 1 month ago
from curl_cffi import requests

session = requests.Session()
resp = session.get('https://httpbin.org/cookies/set/test/"quoted"')
print(f'httpbin: {resp.json()} <-- appears quoted correctly')
print(f'resp: {resp.cookies} <-- appears quoted correctly')
print(f'store: {session.cookies} <-- appears quoted correctly')

print()
resp = session.get('https://httpbin.dev')
print(f'store: {session.cookies} <-- after making a request (even to an unrelated site), it gets mangled')

print()
resp = session.get('https://httpbin.org/cookies/set/test2/unquoted')
print(f'resp: {resp.cookies} <-- incorrect')
print(f'store: {session.cookies} <-- incorrect')

print()
resp = session.get('https://httpbin.org/cookies')
print(f'httpbin: {resp.json()} <-- incorrect')

This demonstrates it well.

lexiforest commented 1 month ago

I see. It seems that the escaped quoted cookies got unquoted twice. Something like this is needed.