scrapy / w3lib

Python library of web-related functions
BSD 3-Clause "New" or "Revised" License
392 stars 104 forks source link

test_safe_url_string_url regressed on 3.11.4 #212

Closed mweinelt closed 1 year ago

mweinelt commented 1 year ago

We are seeing the following test regression on 2.1.1 after updating from 3.11.3 to 3.11.4. Works fine on 3.10.12.


w3lib-aarch64-linux> =================================== FAILURES ===================================
w3lib-aarch64-linux> _ test_safe_url_string_url[https://"%;<=>@[]^`{|}\x7f:"%;<=>@[]^`{|}\x7f:@example.com-https://%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F:%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F%3A@example.com] _
w3lib-aarch64-linux> 
w3lib-aarch64-linux> url = 'https://"%;<=>@[]^`{|}\x7f:"%;<=>@[]^`{|}\x7f:@example.com'
w3lib-aarch64-linux> output = 'https://%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F:%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F%3A@example.com'
w3lib-aarch64-linux> 
w3lib-aarch64-linux>     @pytest.mark.parametrize(
w3lib-aarch64-linux>         "url,output",
w3lib-aarch64-linux>         tuple(
w3lib-aarch64-linux>             case
w3lib-aarch64-linux>             if case[0] not in KNOWN_SAFE_URL_STRING_URL_ISSUES
w3lib-aarch64-linux>             else pytest.param(*case, marks=pytest.mark.xfail(strict=True))
w3lib-aarch64-linux>             for case in SAFE_URL_URL_CASES
w3lib-aarch64-linux>         ),
w3lib-aarch64-linux>     )
w3lib-aarch64-linux>     def test_safe_url_string_url(
w3lib-aarch64-linux>         url: StrOrBytes, output: Union[str, Type[Exception]]
w3lib-aarch64-linux>     ) -> None:
w3lib-aarch64-linux> >       _test_safe_url_string(url, output=output)
w3lib-aarch64-linux> 
w3lib-aarch64-linux> tests/test_url.py:435: 
w3lib-aarch64-linux> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
w3lib-aarch64-linux> tests/test_url.py:343: in _test_safe_url_string
w3lib-aarch64-linux>     return _test_safe_url_func(
w3lib-aarch64-linux> tests/test_url.py:332: in _test_safe_url_func
w3lib-aarch64-linux>     actual = func(url, **kwargs)
w3lib-aarch64-linux> w3lib/url.py:142: in safe_url_string
w3lib-aarch64-linux>     parts = urlsplit(_strip(decoded))
w3lib-aarch64-linux> /nix/store/6n1jdnpxqndjdg6x6g25gnrghiqqhlp4-python3-3.11.4/lib/python3.11/urllib/parse.py:500: in urlsplit
w3lib-aarch64-linux>     _check_bracketed_host(bracketed_host)
w3lib-aarch64-linux> /nix/store/6n1jdnpxqndjdg6x6g25gnrghiqqhlp4-python3-3.11.4/lib/python3.11/urllib/parse.py:446: in _check_bracketed_host
w3lib-aarch64-linux>     ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
w3lib-aarch64-linux> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
w3lib-aarch64-linux> 
w3lib-aarch64-linux> address = ''
w3lib-aarch64-linux> 
w3lib-aarch64-linux>     def ip_address(address):
w3lib-aarch64-linux>         """Take an IP string/int and return an object of the correct type.
w3lib-aarch64-linux>     
w3lib-aarch64-linux>         Args:
w3lib-aarch64-linux>             address: A string or integer, the IP address.  Either IPv4 or
w3lib-aarch64-linux>               IPv6 addresses may be supplied; integers less than 2**32 will
w3lib-aarch64-linux>               be considered to be IPv4 by default.
w3lib-aarch64-linux>     
w3lib-aarch64-linux>         Returns:
w3lib-aarch64-linux>             An IPv4Address or IPv6Address object.
w3lib-aarch64-linux>     
w3lib-aarch64-linux>         Raises:
w3lib-aarch64-linux>             ValueError: if the *address* passed isn't either a v4 or a v6
w3lib-aarch64-linux>               address
w3lib-aarch64-linux>     
w3lib-aarch64-linux>         """
w3lib-aarch64-linux>         try:
w3lib-aarch64-linux>             return IPv4Address(address)
w3lib-aarch64-linux>         except (AddressValueError, NetmaskValueError):
w3lib-aarch64-linux>             pass
w3lib-aarch64-linux>     
w3lib-aarch64-linux>         try:
w3lib-aarch64-linux>             return IPv6Address(address)
w3lib-aarch64-linux>         except (AddressValueError, NetmaskValueError):
w3lib-aarch64-linux>             pass
w3lib-aarch64-linux>     
w3lib-aarch64-linux> >       raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
w3lib-aarch64-linux> E       ValueError: '' does not appear to be an IPv4 or IPv6 address
w3lib-aarch64-linux> 
w3lib-aarch64-linux> /nix/store/6n1jdnpxqndjdg6x6g25gnrghiqqhlp4-python3-3.11.4/lib/python3.11/ipaddress.py:54: ValueError
w3lib-aarch64-linux> _______ test_safe_url_string_url[http://[2a01:5cc0:1:2:3:4]-ValueError] ________
w3lib-aarch64-linux> [XPASS(strict)] 
w3lib-aarch64-linux> =========================== short test summary info ============================
w3lib-aarch64-linux> FAILED tests/test_url.py::test_safe_url_string_url[https://"%;<=>@[]^`{|}\x7f:"%;<=>@[]^`{|}\x7f:@example.com-https://%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F:%22%25%3B%3C%3D%3E%40%5B%5D%5E%60%7B%7C%7D%7F%3A@example.com] - ValueError: '' does not appear to be an IPv4 or IPv6 address
w3lib-aarch64-linux> FAILED tests/test_url.py::test_safe_url_string_url[http://[2a01:5cc0:1:2:3:4]-ValueError]
wRAR commented 1 year ago

Looks like it's https://github.com/python/cpython/issues/103848