Open llchry opened 1 year ago
Can you please update your example to be actually runnable? A link to the exact place (where you think the problem is) is also highly appreciated! 👍
From code inspection there might be a problem here: https://github.com/python/cpython/blob/810d365b5eb2cf3043957ca2971f6e7a7cd87d0d/Lib/urllib/request.py#L2768
The code in the loop converts entries in the system proxy override list into python regular expressions, and does not special case '[' which is a special character in regular expressions but not in the system settings.
From code inspection there might be a problem here:
The code in the loop converts entries in the system proxy override list into python regular expressions, and does not special case '[' which is a special character in regular expressions but not in the system settings.
Yes, I'm sure that's the problem, I'm not going to refer to the code yet. I'd like to see if there's a special treatment for '[' here, but I'm not sure if there's any other issue, and I see there's a special treatment for .*?
.
Can you please update your example to be actually runnable? A link to the exact place (where you think the problem is) is also highly appreciated! 👍
You can use the following code to reproduce the problem, provided that the address needs to be accessed using a proxy and your computer's proxy configuration trustlist contains '[*'.
from urllib import request
import ssl
if __name__ == "__main__":
CTX = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
CTX.load_cert_chain("D:\\Dev\\tools\\key\\kubecfg.crt", "D:\\Dev\\tools\\key\\kubecfg.key")
req = request.Request(url="https://[fa::175:21:1:187]:9543/api/v1/namespaces/test/configmaps/test", method="GET", headers={}, data=None)
resp = request.urlopen(req, context=CTX)
print(resp)
The error stack information is as follows:
D:\temp\venv\Scripts\python.exe C:/Users/llchry/PycharmProjects/untitled/test.py
Traceback (most recent call last):
File "C:\Users\llchry\PycharmProjects\untitled\test.py", line 8, in <module>
resp = request.urlopen(req, context=CTX)
File "D:\programs\python3\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "D:\programs\python3\lib\urllib\request.py", line 517, in open
response = self._open(req, data)
File "D:\programs\python3\lib\urllib\request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "D:\programs\python3\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "D:\programs\python3\lib\urllib\request.py", line 802, in <lambda>
meth(r, proxy, type))
File "D:\programs\python3\lib\urllib\request.py", line 810, in proxy_open
if req.host and proxy_bypass(req.host):
File "D:\programs\python3\lib\urllib\request.py", line 2773, in proxy_bypass
return proxy_bypass_registry(host)
File "D:\programs\python3\lib\urllib\request.py", line 2758, in proxy_bypass_registry
if re.match(test, val, re.I):
File "D:\programs\python3\lib\re.py", line 191, in match
return _compile(pattern, flags).match(string)
File "D:\programs\python3\lib\re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "D:\programs\python3\lib\sre_compile.py", line 788, in compile
p = sre_parse.parse(p, flags)
File "D:\programs\python3\lib\sre_parse.py", line 955, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "D:\programs\python3\lib\sre_parse.py", line 444, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "D:\programs\python3\lib\sre_parse.py", line 550, in _parse
raise source.error("unterminated character set",
re.error: unterminated character set at position 0
In my case, adding a line below this would solve my problem. Any other better suggestions?
test = test.replace("[", r"\[")
That would fix this particular issue, but I'm a bit worried about other special characters in regular expressions, even if those shouldn't be in a proxy exclude list in practice. That said, I'm not an urllib or windows expert (I don't even have access to Windows to test any changes on).
The code block below defines a function to_re
that could replace this problematic loop, but is not tested beyond the two print statements at the end. I'm not convinced yet that using this would be an improvement to the code though, especially for a bug fix that will be back ported to stable releases.
Btw. I initially looked at fnmatch
to replace the loop, but that supports character classes as well which aren't wanted here.
import re
_SPECIAL = {
'*': '.*',
'?': '.',
}
def to_re(hostname_pattern):
return "".join(
_SPECIAL[x] if x in _SPECIAL else re.escape(x)
for x in re.split(r"([.*])", hostname_pattern))
print(f'{to_re("*.python.org")=}')
print(f'{to_re("[*")=}')
Bug report
Prerequisites
Use the following code to request an IPv6 address such as https://[fa::01]:80/,the following error occurs: re.error: unterminated character set at position 0
I noticed that the problem comes from the proxy_bypass_registry method in urllib/request.py, which reads the system's proxy configuration and checks whether the host needs a proxy, but the regular expression
[*
works fine on windows, but fails to verify in python. I thought about configuring*]
on windows, but this won't save, so I'm guessing there's a difference in the regular expression implementation between the two systems. So can we add an escape processing in the code and replace[
with\[
?Your environment