Open cpina opened 2 years ago
In case that it helps, the stacktrace before hits the line:
errmsg = "invalid continuation byte";
In Objects/unicodeobject.c
function unicode_decode_utf8
.
Backtrace:
#0 unicode_decode_utf8 (s=0x555555a2e8e0 "����������� ��� ��� ������", size=26, error_handler=_Py_ERROR_UNKNOWN, errors=0x0, consumed=0x0)
at Objects/unicodeobject.c:5069
#1 0x00005555556348c4 in PyUnicode_DecodeUTF8Stateful (s=0x555555a2e8e0 "����������� ��� ��� ������", size=26, errors=0x0, consumed=0x0)
at Objects/unicodeobject.c:5141
#2 0x0000555555629dae in PyUnicode_FromStringAndSize (u=0x555555a2e8e0 "����������� ��� ��� ������", size=26) at Objects/unicodeobject.c:2267
#3 0x00005555556a0064 in do_mkvalue (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, flags=1) at Python/modsupport.c:423
#4 0x000055555569f5cd in do_mktuple (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, endchar=41 ')', n=2, flags=1) at Python/modsupport.c:264
#5 0x000055555569f737 in do_mkvalue (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, flags=1) at Python/modsupport.c:289
#6 0x00005555556a06ac in va_build_value (format=0x7ffff79bf942 "(is)", va=0x7fffffff73f0, flags=1) at Python/modsupport.c:562
#7 0x00005555556a05b0 in _Py_BuildValue_SizeT (format=0x7ffff79bf942 "(is)") at Python/modsupport.c:530
#8 0x00007ffff79b3a91 in set_gaierror (error=-2) at /root/python/Python-3.9.2/Modules/socketmodule.c:680
#9 0x00007ffff79b43b2 in setipaddr (name=0x7ffff7b6bb90 "reprotest-capture-hostname", addr_ret=0x7fffffffb600, addr_ret_size=128, af=0)
at /root/python/Python-3.9.2/Modules/socketmodule.c:1211
#10 0x00007ffff79bada7 in socket_gethostbyaddr (self=0x7ffff79de220, args=0x7ffff7b64940) at /root/python/Python-3.9.2/Modules/socketmodule.c:5822
Ignore the line numbers - In some files I had added some debug information.
I wonder (but I cannot reproduce outside Python) if the handling of the result of set_gaierror
is what is causing errors depending on the locale settings.
If it helps, gai_strerror is called (in set_gaierror) and might return a localised error:
root@reprotest-capture-hostname:~/t# cat bug.py
import locale
import socket
locale.setlocale(locale.LC_ALL, '')
print('test')
socket.getfqdn()
root@reprotest-capture-hostname:~/t# ./a.out
test
gai_strerror: Name or service not known
root@reprotest-capture-hostname:~/t# LANG=ru_RU.CP1251 ./a.out
test
gai_strerror: ����������� ��� ��� ������
root@reprotest-capture-hostname:~/t#
In set_gaierror there is:
v = Py_BuildValue("(is)", error, gai_strerror(error));
With the russian locale (and I suspect that other locales) it seems that when using PyUnicode_FromString via Py_BuildValue it cannot create the PyUnicode (see the original post) and it all fails.
Hopefully this helps to find the error.
The problem remains in the prerelease of Python 3.13 coming with Fedora 41 beta currently.
We just hit this using the Latin-1 Swedish locale sv_SE
and a call of socket.gethostbyaddr
with an argument that replies with an error code. That results in a UnicodeDecodeError
exception rather than the expected socket.herror
exception.
Our analysis came to the same conclusion as above, with the added little detail that the format string s
according to the documentation does indeed interpret the string as UTF-8. But that is not what gai_strerror
returns.
Bug report
This code:
Raise an exception if running it like this:
Note the LANG. I haven't checked for which "LANG" this works or fails.
:warning: : to exercise the problematic code (see comments for details on the problematic code path) the hostname should not be resolvable (so not in
/etc/hosts
, not resolvable via DNS or other methods up to/etc/nsswitch.conf
hosts settings). The hostname, to reproduce the problem, can be changed on Linux viasudo hostname something-that-does-not-exist
.Your environment
Tested this on a Debian 11 bullseye with the the following Python interpreters:
I've encountered this bug in two independent Debian installations (with different locale settings) and in a CI system (also Debian based but unrelated settings).
Only tested in x64 systems.