Open dgw opened 1 week ago
PR opened to be more graceful about it, but considering http://<foo>/
and http://999.999.999.999/
are parsed successfully, I think this is a urllib bug?
>>> urlparse("http://<test>/")
ParseResult(scheme='http', netloc='<test>', path='/', params='', query='', fragment='')
but considering
http://<foo>/
andhttp://999.999.999.999/
are parsed successfully, I think this is a urllib bug?
Square brackets are special. The reported "error" URL is actually invalid per RFC 3986 § 3.2.2:
A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.
The only thing that's supposed to go in square brackets in a URI is an IPv6 (or later) literal, full stop. You could probably even get away with not "safeifying" these invalid links at all, since they can't be followed anyway.
And yes, I know that still leaves us with inconsistent behavior between different Python versions. But Python apparently decided urllib should be stricter about following the URI spec. 🤷♂️
If someone sends a link while
safety
is active in the channel, and the URL contains a placeholder hostname in square brackets, Sopel will spit out an "Unexpected ValueError" message. Note: Seems to happen only on Python 3.11 or higher.This is from the
safeify_url()
function added in #2279, which usesurllib.parse.urlparse()
to make sanitizing URL parts easier, which in turn usesipaddress.ip_address()
to raise an error for bracketed IPv4 addresses—and trips on this weird edge case:https://github.com/sopel-irc/sopel/blob/e7d86481d186f2faafa01cc46131485d2b4966d4/sopel/builtins/safety.py#L127-L133
Simple examples using the Python console:
The version inconsistency is going to be the worst part of designing a "correct" fix for this. A simple fallback approach (such as
return url.replace('http', 'hxxp', 1) if url.startswith('http') else url
) will miss more complicated cases that are still handled fine in older Python versions (output using 3.10 shown here):_Do note though that all this is an edge case of an edge case. People must intentionally construct these invalid URLs, and can be trained to simply use another type of bracket for placeholders instead, such as
http://<Target-IP>/cgi-bin/account_mgr.cgi
._