python / cpython

The Python programming language
https://www.python.org
Other
62.59k stars 30.04k forks source link

urllib.parse.urlparse doesn't check port #88037

Open 3960998a-15e7-45ca-91ff-62b29fe7aaba opened 3 years ago

3960998a-15e7-45ca-91ff-62b29fe7aaba commented 3 years ago
BPO 43871
Nosy @orsenthil, @vstinner, @tirkarthi, @p-alik, @miguendes
PRs
  • python/cpython#25774
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at = None created_at = labels = ['3.10'] title = "urllib.parse.urlparse doesn't check port" updated_at = user = 'https://github.com/p-alik' ``` bugs.python.org fields: ```python activity = actor = 'miguendes' assignee = 'orsenthil' closed = False closed_date = None closer = None components = [] creation = creator = 'palik' dependencies = [] files = [] hgrepos = [] issue_num = 43871 keywords = ['patch'] message_count = 5.0 messages = ['391238', '391242', '391265', '391282', '392577'] nosy_count = 5.0 nosy_names = ['orsenthil', 'vstinner', 'xtreak', 'palik', 'miguendes'] pr_nums = ['25774'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue43871' versions = ['Python 3.10'] ```

    3960998a-15e7-45ca-91ff-62b29fe7aaba commented 3 years ago

    It is possible to get valid ParseResult from the urlparse function even for a non-numeric port value. Only by requesting the port it fails[1]. Would it be an improvement if _checknetloc[2] validates the value of port properly?

    // code snippet
    Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
    [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from urllib.parse import urlparse
    >>> uri = 'xx://foo:bar'
    >>> uri_parts = urlparse(uri)
    >>> uri_parts.netloc
    'foo:bar'
    >>> uri_parts.port
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.8/urllib/parse.py", line 174, in port
        raise ValueError(message) from None
    ValueError: Port could not be cast to integer value as 'bar'
    // code snippet

    [1] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L172 [2] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L416

    tirkarthi commented 3 years ago

    I guess moving port validation logic to parsing time is done as part of https://github.com/python/cpython/pull/16780

    orsenthil commented 3 years ago

    Treating this as bug in itself might be a better idea than waiting for a ipv6 scope introduction, which had few caveats.

    Would it be an improvement if _checknetloc[2] validates the value of port properly?

    Yes, we could check if it is an int. That should be sufficient.

    3960998a-15e7-45ca-91ff-62b29fe7aaba commented 3 years ago

    Thank you for your swift response and your willingness to add port validation to _checknetloc.

    I think the validation itself should compound both exceptional branches implemented in port[3]

    [3] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L173-L178

    24007b43-d173-43d4-992a-c33ef8b8e01b commented 3 years ago

    I also think the validation logic should be ran as early as possible.

    I gave it a shot and implemented it.

    I appreciate any reviews: https://github.com/python/cpython/pull/25774

    Got some ideas from https://github.com/python/cpython/pull/16780