lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.34k stars 255 forks source link

[Feature]cookiejar #358

Closed a-n-i-m-e-z closed 3 months ago

a-n-i-m-e-z commented 3 months ago

requests cookies.py

def get_cookie_header(jar, request):
    """
    Produce an appropriate Cookie header string to be sent with `request`, or None.

    :rtype: str
    """
    r = MockRequest(request)
    jar.add_cookie_header(r)
    return r.get_new_headers().get("Cookie")

它会根据请求的url 自动加载jar路径cookie, 底层使用的是http.cookiejar的 add_cookie_header

我希望这个库可以支持一下这个功能

perklet commented 3 months ago

没太懂你什么意思

a-n-i-m-e-z commented 3 months ago

Domain cookie生效的域名, cookie只会在对应的domain下生效(直接表现为可获取), 而domain的层级是向下继承的, 所以下级域名可以获取到上级域名的cookie; 举个例子就是: inbox.google.com的cookie就只能被inbox.google.com下属的页面获取, 而不能被mail.google.com获取; 通常为cookie生成所属的域名;

Path 和Domain基本类似, 只是限制放在了路径上, 同样也是向下继承, 下级路径可以获取上级路径的cookie, 举个例子就是: google.com的域名, /reader/的二级路径的话, 就只能在google.com/reader/及以下的路径下获取. 默认情况下为cookie赋予的当前路径.

加载cookie.txt文件 ↓

Netscape HTTP Cookie File

http://curl.haxx.se/rfc/cookie_spec.html

This is a generated file! Do not edit.

.ad-m.asia TRUE / TRUE 1739458209 uid hmfmJOJVnv .addthis.com TRUE / TRUE 1739458209 na_id 2022082703542900016290574942 .addthis.com TRUE / TRUE 1739458209 na_tc Y .addthis.com TRUE / TRUE 1739458209 ouid 63099575000163828d3cab2c5ed8e47faa7e9003562abc442193 .addthis.com TRUE / TRUE 1739458209 uid 63099575cf255187 .adform.net TRUE / TRUE 1739458209 uid 6778488699313307676

cj = MozillaCookieJar() cj.load("cookie.txt") for c in cj: print(c)

输出结果 <Cookie uid=hmfmJOJVnv for .ad-m.asia/> <Cookie na_id=2022082703542900016290574942 for .addthis.com/> <Cookie na_tc=Y for .addthis.com/> <Cookie ouid=63099575000163828d3cab2c5ed8e47faa7e9003562abc442193 for .addthis.com/> <Cookie uid=63099575cf255187 for .addthis.com/> <Cookie uid=6778488699313307676 for .adform.net/>

1当访问一个url时,requests库会根据cookiejar中的cookie信息,交给http.cookiejar的 add_cookie_header处理,类似浏览器的cookie管理器,处理cookie路径\到期(删除)...

2然后将该url的cookie键值 拼接起来添加到请求头中 (headers={'cookie': ke1=v1, k2=v2...})

3发起正式请求

因此,当我访问addthis.com时, cookie会自动处理并添加到请求头中

headers={'user-Agent': 'requessts1.0'} requests.get("http://addthis.com", cookies=cj) headers={'user-Agent': 'requessts1.0', 'cookie': 'na_id=2022082703542900016290574942; na_tc=Y; ouid=63099575000163828d3cab2c5ed8e47faa7e9003562abc442193; uid=63099575cf255187'}

a-n-i-m-e-z commented 3 months ago

https://hustyichi.github.io/2019/10/07/requests-cookies/

这是一篇对requests cookie处理的分析文章,推荐您看一下

perklet commented 3 months ago

你不要甩这么多东西出来,还是得描述清楚你的需求。既然你愿意解读源码,你也不妨从 curl_cffi 的源码中找找你需要的

a-n-i-m-e-z commented 3 months ago

发现curl_cffi 是可以带入cookiejar的 应该是我昨天导出的cookie文件有问题