python / cpython

The Python programming language
https://www.python.org
Other
63.34k stars 30.32k forks source link

Incorrect behaviour for user@password URI pattern in urlparse #81859

Closed bf3cc03a-b188-424e-a24b-559394ef0d28 closed 9 months ago

bf3cc03a-b188-424e-a24b-559394ef0d28 commented 5 years ago
BPO 37678
Nosy @truebit, @potomak

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-bug', 'library'] title = 'Incorrect behaviour for user@password URI pattern in urlparse' updated_at = user = 'https://github.com/truebit' ``` bugs.python.org fields: ```python activity = actor = 'potomak' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Sean.Wang' dependencies = [] files = [] hgrepos = [] issue_num = 37678 keywords = [] message_count = 2.0 messages = ['348431', '348593'] nosy_count = 2.0 nosy_names = ['Sean.Wang', 'potomak'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue37678' versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7'] ```

bf3cc03a-b188-424e-a24b-559394ef0d28 commented 5 years ago

When an IPV4 URL with 'username:password' in it, and the password contains special characters like #[]?, urlparse would act as unexcepted. example:

urlparse('http://user:pass#?[word@example.com:80/path')
90ca3d02-d916-4a83-a23b-8cc5f491b058 commented 5 years ago

What do you mean that urlparse act as unexpected?

I tried your example and I think urlparse's behavior is correct.

From the RFC 1738:

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

Your example:

>>> from urllib.parse import urlparse
>>> urlparse('http://user:pass#?[word@example.com:80/path')
ParseResult(scheme='http', netloc='user:pass', path='', params='', query='', fragment='?[word@example.com:80/path')

Part of the password is parsed as the URL fragment because the character # has a special meaning:

The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.

saito828koki commented 2 years ago

This is not a bug. This issue can be closed.

davidism commented 9 months ago

@serhiy-storchaka this is the same invalid issue as #110869 which you just closed. This can be closed as well.