python / cpython

The Python programming language
https://www.python.org
Other
63.45k stars 30.38k forks source link

`urllib.parse.parse_qs` behaves differenly in Python 3.9.2 compared to Python 3.9.1 #117109

Closed yozachar closed 7 months ago

yozachar commented 7 months ago

Bug report

Bug description:

Code

from urllib.parse import parse_qs
parse_qs("q=search');alert(document.domain);", strict_parsing=True)

Expected Behavior

ValueError: bad query field: 'alert(document.domain)'

Actual Behavior

Python versions 3.9.2 and above does NOT raise ValueError even when strict_parsing is set True.

CPython versions tested on:

3.8, 3.9, 3.10, 3.11, 3.12

Operating systems tested on:

Linux

Additional Context

from https://github.com/python-validators/validators/issues/338#issuecomment-2011014740

serhiy-storchaka commented 7 months ago

See #87133. Treating ";" as a query separator by default created a security issue.

Use separator='&;' to reproduce the old behavior.

yozachar commented 7 months ago

Hey, thanks for the reply. People running Python >= 3.9.2 and < 3.10 are in kind of a fix then, aren't they?

Also multi-character separator= doesn't seem to replicate the old behaviour.

Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qs
>>> parse_qs("q=search');alert(document.domain);", strict_parsing=True, separator='&;')
{'q': ["search');alert(document.domain);"]}

instead you've to specify separator=';'

>>> parse_qs("q=search');alert(document.domain);", strict_parsing=True, separator=';')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/urllib/parse.py", line 718, in parse_qs
    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/parse.py", line 780, in parse_qsl
    raise ValueError("bad query field: %r" % (name_value,))
ValueError: bad query field: 'alert(document.domain)'

Will a patch be backported?

serhiy-storchaka commented 7 months ago

Ah, right, multicharacter separator does not work, so it is impossible to restore the old behavior exactly.

Is it worth to add support of multicharacter separator? Should it be considered a new feature or a bug fix? I don't know. You are the first person in three years to report the problem. Note that even if we consider this a bug, the change can only be backported to version 3.11.

cc @orsenthil, @Fidget-Spinner, @vstinner, @AdamGold

vstinner commented 7 months ago

The behavior changed on purpose to fix a security issue: https://python-security.readthedocs.io/vuln/urllib-query-string-semicolon-separator.html => issue gh-87133.

vstinner commented 7 months ago

parse_qs("q=search');alert(document.domain);", strict_parsing=True)

The strict behavior is requested, IMO rejecting Javascript sounds like the right behavior. What makes you think that Python should reply something else? What is your expected behavior?

yozachar commented 7 months ago

The strict behavior is requested, IMO rejecting Javascript sounds like the right behavior. What makes you think that Python should reply something else? What is your expected behavior?

Rejecting JavaScript indeed. It does not do so in Python v3.9.2, but I see now that the separator parameter works in the same version even though the docs claim it (separator parameter) has been added only in v3.10.

Python 3.9.1 (default, Mar 21 2024, 06:40:47) 
[GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qs
>>> parse_qs("q=search');alert(document.domain);", strict_parsing=True, separator=';')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: parse_qs() got an unexpected keyword argument 'separator'
Python 3.9.2 (default, Mar 21 2024, 06:39:21) 
[GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qs
>>> parse_qs("q=search');alert(document.domain);", strict_parsing=True, separator=';')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/us-er/.local/share/mise/installs/python/3.9.2/lib/python3.9/urllib/parse.py", line 695, in parse_qs
    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
  File "/home/us-er/.local/share/mise/installs/python/3.9.2/lib/python3.9/urllib/parse.py", line 756, in parse_qsl
    raise ValueError("bad query field: %r" % (name_value,))
ValueError: bad query field: 'alert(document.domain)'

Also multi-character separator= is nice to have, it'd certainly avoid something like this

    try:
        if (
            query
            and parse_qs(query, strict_parsing=strict_query, separator="&")
            and parse_qs(query, strict_parsing=strict_query, separator=";")
        ):
            optional_segments &= True
    except TypeError:
        # for Python < v3.9.2 (official v3.10)
        if query and parse_qs(query, strict_parsing=strict_query):
            optional_segments &= True

I'll be closing this issue now.