Open 0b9f8bff-79ac-4298-8eaa-403a903556f3 opened 5 years ago
http.cookies.BaseCookie[1] can't parse Expires in this format like Expires=Thu,31 Jan 2019 05:56:00 GMT;(Less space after Thu,).
I encountered this problem in actual use, Chrome, IE and Firefox can parse this string normally. Many languages, such as JavaScript, can also parse this data automatically.
I built a test site using Flask: https://paste.ubuntu.com/p/K7Z4K4KH7Z/, Use curl and requests to get cookies correctly, but not with aiohttp (because it uses http.cookies.BaseCookie).
Looking at MDN[2] and rfc[3](Thanks tirkarthi), this doesn't seem to be a canonical behavior, But some Java WEB frameworks will produce this behavior (such as the one that caused me to find the problem).
This problem can be solved by modifying a regular expression[4], but I don't know if it should be compatible with this non-standard way of writing.
English is not my native language; please excuse typing errors.
[1] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L457 [2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives [3] https://tools.ietf.org/html/rfc6265#section-4.1.1 [4] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L444
Thanks for the MDN cookie directive link. I didn't know it links to Date link in the GitHub PR. I don't see space optional in the sane-date format specified for expires attribute. I could be reading the grammar wrong. I will wait for others thoughts on this.
I presume MeiK wants to use BaseCookie to parse the Set-Cookie header field, as in
>>> BaseCookie('Hello=World; Expires=Thu, 31 Jan 2019 05:56:00 GMT;')
<BaseCookie: Hello='World'>
>>> BaseCookie('Hello=World; Expires=Thu,31 Jan 2019 05:56:00 GMT;')
<BaseCookie: >
Karthikeyan, if you meant the “sane-cookie-date” format (https://tools.ietf.org/html/rfc6265#page-9), that is just the IETF’s recommended date format. I suspect MeiK is trying to _parse_ the date rather than generate it, in which case the procedure in \https://tools.ietf.org/html/rfc6265#section-5.1.1\ may be more relevant. Spaces and commas are both treated as delimiters, so the problematic Expires attribute should parse fine.
BTW, this special handling of Set-Cookie attributes like Expires is not documented, though it does seem intentional. According to the documentation they should be treated as new Morsels.
Yes, sorry I thought it was the format used for parsing too. Thanks for the example Martin. I am linking @MeiK PR to the issue where I asked them to open an issue for this.
Another example of a value that fails to parse is if "-0000" is used instead of "GMT", which is the case with GitHub:
Set-Cookie: has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000
So using a regular expression here to only parse the sane-cookie-date format (that is recommended for output) is wrong.
The last change to it was in 2012 only (https://github.com/python/cpython/commit/aeeba2629aa52e4e73e19a1502b3d3133ea68dec)
http.cookiejar parses this correctly, using http2time:
>>> import http.cookiejar
>>> http.cookiejar.parse_ns_headers(["has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000"])
[[('has_recent_activity', '1'), ('path', '/'), ('expires', 1555975638), ('version', '0')]]
You are right, I saw the agreed way of parsing in RFC6265[1], it seems that you should not use regular expressions.
I used http.cookiejar to update the code, but it failed to pass the test: https://github.com/python/cpython/blob/master/Lib/test/test_http_cookies.py#L19. However, other languages and libraries (JavaScript, Requests, http.cookiejar, etc.) cannot parse it. It seems that the contents of the brackets should be escaped. Is this a wrong test case?
I updated the code[2] using http.cookiejar. Is this a good idea?
English is not my native language; please excuse typing errors.
[1] https://tools.ietf.org/html/rfc6265 [2] https://github.com/python/cpython/pull/11665/commits/a03bc75348a4041c7411da3175689c087a98789f
I found that using http.cookiejar.parse_ns_headers would cause some of the previous tests to fail, and if you think this method is workable, I can follow it to write a new one and pass all the tests.
Test_http_cookies line 19 has the following test case:
{'data': 'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"', 'dict': {'keebler' : 'E=mc2; L="Loves"; fudge=\012;'}, 'repr': '''\<SimpleCookie: keebler='E=mc2; L="Loves"; fudge=\\n;'>''', 'output': 'Set-Cookie: keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"'}
This is similar to an example in the documentation:
>>> C.load('keebler="E=everybody; L=\\"Loves\\"; fudge=\\012;";')
>>> print(C)
Set-Cookie: keebler="E=everybody; L=\"Loves\"; fudge=\012;"
If you break parsing of this string in the “load” method, you break documented behaviour. The “http.cookie” module is documented to follow RFC 2109. I believe the strings are valid by RFC 2109, in which the value is allowed to use the HTTP “quoted-string” format.
I seems like http.cookiejar should be used for clients, which includes more relaxed parsing of cookies. This is mentioned in the docs at https://github.com/python/cpython/blame/443fe5a52a3d6a101795380227ced38b4b5e0a8b/Doc/library/http.cookies.rst#L63-L65.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['extension-modules', 'type-feature', '3.8']
title = 'http.cookies._CookiePattern modifying regular expressions'
updated_at =
user = 'https://github.com/MeiK2333'
```
bugs.python.org fields:
```python
activity =
actor = 'blueyed'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation =
creator = 'MeiK'
dependencies = []
files = []
hgrepos = []
issue_num = 35824
keywords = ['patch']
message_count = 10.0
messages = ['334338', '334339', '334392', '334393', '340683', '340684', '340688', '340689', '340831', '341321']
nosy_count = 4.0
nosy_names = ['blueyed', 'martin.panter', 'xtreak', 'MeiK']
pr_nums = ['11665']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue35824'
versions = ['Python 3.8']
```