Open 0cb61241-3555-4d74-b141-a27d0b92cdce opened 2 years ago
If http:@localhost url is entered as an argument value of the urlpasre() function, the parser cannot parse it properly. Since http:@localhost is a valid URL, the character after the @ character must be parsed as a hostname.
Python 3.9.10 (main, Jan 15 2022, 11:48:04)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> print(urlparse('http:@localhost'))
ParseResult(scheme='http', netloc='', path='@localhost', params='', query='', fragment='')
>>>
>>> print(urlparse('https:\\google.com'))
ParseResult(scheme='https', netloc='', path='\\google.com', params='', query='', fragment='')
>>> print(urlparse('https://google.com@localhost'))
ParseResult(scheme='https', netloc='google.com@localhost', path='', params='', query='', fragment='')
>>>
Perhaps this parser is not able to parse the URL normally.
Here are the results from other languages. Go parses incorrectly at the same time rust does it correctly.
Go- https://go.dev/play/p/nNMhyznuGpn &url.URL{Scheme:"http", Opaque:"@localhost", User:(*url.Userinfo)(nil), Host:"", Path:"", RawPath:"", ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}
Rust- https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=92681b56f7cbd62b7735c962a2f5321e Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: None, path: "/", query: None, fragment: None }
Nice Check. So what do you think about this issue? I want to hear your opinions.
I believe http:@localhost
is a valid URL according to RFC 3986. But the at character (@) is part of the path component. There is no hostname in this URL, because there is no double slash (//) introducing an authority component. ~Similarly, for the case with the backslashes (\\
). Both at (@) and backslash (\
) characters are allowed in path segments in RFC 3986.~
For the google.com@localhost case, the google.com part is the userinfo component.
~In all three cases I believe urlparse is correct, so this can be closed.~
Edit: For the two cases involving at (@) symbols, urlparse is correct. For the case involving backslashes (\\
), the URL is not valid.
I misinterpreted reserved and unreserved in the RFC. Here are the rules:
path-rootless = segment-nz *( "/" segment )
segment-nz = 1*pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-security', '3.9']
title = 'Improper Input Validation in urlparse'
updated_at =
user = 'https://github.com/P0cas'
```
bugs.python.org fields:
```python
activity =
actor = 'P0cas'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'P0cas'
dependencies = []
files = []
hgrepos = []
issue_num = 46870
keywords = []
message_count = 4.0
messages = ['414132', '414133', '414171', '414345']
nosy_count = 3.0
nosy_names = ['orsenthil', 'P0cas', 'karanchaudhary']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue46870'
versions = ['Python 3.9']
```