python / cpython

The Python programming language
https://www.python.org
Other
63.72k stars 30.53k forks source link

`http.cookies.SimpleCookie.load()` fails to consistently handle malformed cookies #127195

Open moonsikpark opened 6 days ago

moonsikpark commented 6 days ago

Bug report

Bug description:

There are several issues with http.cookies.SimpleCookie.load() that deviate from current browser behavior:

  1. Malformed cookies are not processed at all

Consider the cookie a=b;c=d\x09d;e=f. The e value contains \x09, which is not allowed per RFC 6265, Section 4.1.1.

When this is sent to a browser (Chrome 130), the browser processes all valid cookies and filters out invalid ones:

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: a=b;
Set-Cookie: c=d d;
Set-Cookie: e=f

Resulting behavior:

> document.cookie
< 'a=b; e=f'

However, http.cookies.SimpleCookie.load() ignores the entire cookie string:

>>> from http import cookies
>>> C = cookies.SimpleCookie()
>>> C.load("a=b;c=d\x09d;e=f")
>>> C.output()
''
  1. Malformed cookies are inconsistently processed

Consider the cookie a=b;c={"d":"e"};f=g. The c value is invalid per RFC 6265, Section 4.1.1.

Browsers process this cookie without an issue:

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: a=b;
Set-Cookie: c={"d":"e"};
Set-Cookie: f=g

Resulting behavior:

> document.cookie
< 'a=b; c={"d":"e"}; f=g'

However, http.cookies.SimpleCookie.load() processes only the valid portion before the malformed cookie and stops entirely:

>>> from http import cookies
>>> C = cookies.SimpleCookie()
>>> C.load('a=b; c={"d":"e"}; f=g')
>>> C.output()
'Set-Cookie: a=b'

It seems we should ensure consistent handling by (a) processing all valid cookies and discarding only invalid ones, or (b) rejecting the entire cookie string if any invalid cookie is present.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

moonsikpark commented 5 days ago

After further investigation, it appears the module exhibits inconsistent behavior when encountering unexpected inputs and it is likely due to its regex-based parsing, which assumes very clean input.

https://github.com/python/cpython/blob/a4d4c1ede21f9fa72280f4fc0f50212eecfac9ae/Lib/http/cookies.py#L420-L437

If the maintainers are in agreement, I’d like to propose a patch to improve cookie parsing by following these steps:

  1. Split the cookie into parts.
  2. For each part: (a) Validate the key. If invalid, issue a warning and skip the part. (b) Validate the value. If invalid, issue a warning and skip the part.
  3. Store only valid parts in the cookie jar.

Additionally, I’d like to gather opinions on whether we should allow invalid—but commonly occurring—characters in cookie values, such as those used in JSON. While the RFC advises against accepting these characters, they are widely accepted by major browsers and have become a common practice among web developers.

@picnixz, it seems you triaged my issue. If possible, could you notify the appropriate core maintainers?

picnixz commented 2 days ago

cc @serhiy-storchaka