python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.36k forks source link

http.cookies._CookiePattern modifying regular expressions #80005

Open 0b9f8bff-79ac-4298-8eaa-403a903556f3 opened 5 years ago

0b9f8bff-79ac-4298-8eaa-403a903556f3 commented 5 years ago
BPO 35824
Nosy @blueyed, @vadmium, @tirkarthi, @MeiK2333
PRs
  • python/cpython#11665
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', 'type-feature', '3.8'] title = 'http.cookies._CookiePattern modifying regular expressions' updated_at = user = 'https://github.com/MeiK2333' ``` bugs.python.org fields: ```python activity = actor = 'blueyed' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'MeiK' dependencies = [] files = [] hgrepos = [] issue_num = 35824 keywords = ['patch'] message_count = 10.0 messages = ['334338', '334339', '334392', '334393', '340683', '340684', '340688', '340689', '340831', '341321'] nosy_count = 4.0 nosy_names = ['blueyed', 'martin.panter', 'xtreak', 'MeiK'] pr_nums = ['11665'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue35824' versions = ['Python 3.8'] ```

    0b9f8bff-79ac-4298-8eaa-403a903556f3 commented 5 years ago

    http.cookies.BaseCookie[1] can't parse Expires in this format like Expires=Thu,31 Jan 2019 05:56:00 GMT;(Less space after Thu,).

    I encountered this problem in actual use, Chrome, IE and Firefox can parse this string normally. Many languages, such as JavaScript, can also parse this data automatically.

    I built a test site using Flask: https://paste.ubuntu.com/p/K7Z4K4KH7Z/, Use curl and requests to get cookies correctly, but not with aiohttp (because it uses http.cookies.BaseCookie).

    Looking at MDN[2] and rfc[3](Thanks tirkarthi), this doesn't seem to be a canonical behavior, But some Java WEB frameworks will produce this behavior (such as the one that caused me to find the problem).

    This problem can be solved by modifying a regular expression[4], but I don't know if it should be compatible with this non-standard way of writing.

    English is not my native language; please excuse typing errors.

    [1] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L457 [2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives [3] https://tools.ietf.org/html/rfc6265#section-4.1.1 [4] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L444

    tirkarthi commented 5 years ago

    Thanks for the MDN cookie directive link. I didn't know it links to Date link in the GitHub PR. I don't see space optional in the sane-date format specified for expires attribute. I could be reading the grammar wrong. I will wait for others thoughts on this.

    vadmium commented 5 years ago

    I presume MeiK wants to use BaseCookie to parse the Set-Cookie header field, as in

    >>> BaseCookie('Hello=World; Expires=Thu, 31 Jan 2019 05:56:00 GMT;')
    <BaseCookie: Hello='World'>
    >>> BaseCookie('Hello=World; Expires=Thu,31 Jan 2019 05:56:00 GMT;')
    <BaseCookie: >

    Karthikeyan, if you meant the “sane-cookie-date” format (https://tools.ietf.org/html/rfc6265#page-9), that is just the IETF’s recommended date format. I suspect MeiK is trying to _parse_ the date rather than generate it, in which case the procedure in \https://tools.ietf.org/html/rfc6265#section-5.1.1\ may be more relevant. Spaces and commas are both treated as delimiters, so the problematic Expires attribute should parse fine.

    BTW, this special handling of Set-Cookie attributes like Expires is not documented, though it does seem intentional. According to the documentation they should be treated as new Morsels.

    tirkarthi commented 5 years ago

    Yes, sorry I thought it was the format used for parsing too. Thanks for the example Martin. I am linking @MeiK PR to the issue where I asked them to open an issue for this.

    1f7c169c-a6ad-45ae-b614-0d40d962c776 commented 5 years ago

    Another example of a value that fails to parse is if "-0000" is used instead of "GMT", which is the case with GitHub:

    Set-Cookie: has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000

    So using a regular expression here to only parse the sane-cookie-date format (that is recommended for output) is wrong.

    The last change to it was in 2012 only (https://github.com/python/cpython/commit/aeeba2629aa52e4e73e19a1502b3d3133ea68dec)

    1f7c169c-a6ad-45ae-b614-0d40d962c776 commented 5 years ago

    http.cookiejar parses this correctly, using http2time:

        >>> import http.cookiejar
        >>> http.cookiejar.parse_ns_headers(["has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000"])
        [[('has_recent_activity', '1'), ('path', '/'), ('expires', 1555975638), ('version', '0')]]

    Ref: https://github.com/python/cpython/blob/9f316bd9684d27b7e21fbf43ca86dc5e65dac4af/Lib/http/cookiejar.py#L204-L249

    0b9f8bff-79ac-4298-8eaa-403a903556f3 commented 5 years ago

    You are right, I saw the agreed way of parsing in RFC6265[1], it seems that you should not use regular expressions.

    I used http.cookiejar to update the code, but it failed to pass the test: https://github.com/python/cpython/blob/master/Lib/test/test_http_cookies.py#L19. However, other languages and libraries (JavaScript, Requests, http.cookiejar, etc.) cannot parse it. It seems that the contents of the brackets should be escaped. Is this a wrong test case?

    I updated the code[2] using http.cookiejar. Is this a good idea?

    English is not my native language; please excuse typing errors.

    [1] https://tools.ietf.org/html/rfc6265 [2] https://github.com/python/cpython/pull/11665/commits/a03bc75348a4041c7411da3175689c087a98789f

    0b9f8bff-79ac-4298-8eaa-403a903556f3 commented 5 years ago

    I found that using http.cookiejar.parse_ns_headers would cause some of the previous tests to fail, and if you think this method is workable, I can follow it to write a new one and pass all the tests.

    vadmium commented 5 years ago

    Test_http_cookies line 19 has the following test case:

    {'data': 'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"', 'dict': {'keebler' : 'E=mc2; L="Loves"; fudge=\012;'}, 'repr': '''\<SimpleCookie: keebler='E=mc2; L="Loves"; fudge=\\n;'>''', 'output': 'Set-Cookie: keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"'}

    This is similar to an example in the documentation:

    >>> C.load('keebler="E=everybody; L=\\"Loves\\"; fudge=\\012;";')
    >>> print(C)
    Set-Cookie: keebler="E=everybody; L=\"Loves\"; fudge=\012;"

    If you break parsing of this string in the “load” method, you break documented behaviour. The “http.cookie” module is documented to follow RFC 2109. I believe the strings are valid by RFC 2109, in which the value is allowed to use the HTTP “quoted-string” format.

    1f7c169c-a6ad-45ae-b614-0d40d962c776 commented 5 years ago

    I seems like http.cookiejar should be used for clients, which includes more relaxed parsing of cookies. This is mentioned in the docs at https://github.com/python/cpython/blame/443fe5a52a3d6a101795380227ced38b4b5e0a8b/Doc/library/http.cookies.rst#L63-L65.