python / cpython

The Python programming language
https://www.python.org
Other
62.46k stars 29.98k forks source link

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

Closed 6cde08a7-edea-48aa-abe4-0dd06e8ad12b closed 5 years ago

6cde08a7-edea-48aa-abe4-0dd06e8ad12b commented 9 years ago
BPO 22747
Nosy @malemburg, @loewis, @pitrou, @vstinner, @skrah, @xdegaye, @Fak3
Files
  • no_langinfo_during_init.patch
  • locale.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-crash'] title = 'Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined' updated_at = user = 'https://bugs.python.org/WanderingLogic' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['Interpreter Core'] creation = creator = 'WanderingLogic' dependencies = [] files = ['37046', '42585'] hgrepos = [] issue_num = 22747 keywords = ['patch'] message_count = 11.0 messages = ['230106', '230111', '230385', '230391', '230393', '230394', '230407', '264160', '264202', '264203', '342542'] nosy_count = 10.0 nosy_names = ['lemburg', 'loewis', 'pitrou', 'vstinner', 'Arfrever', 'skrah', 'xdegaye', 'python-dev', 'Roman.Evstifeev', 'WanderingLogic'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'crash' url = 'https://bugs.python.org/issue22747' versions = ['Python 3.4'] ```

    6cde08a7-edea-48aa-abe4-0dd06e8ad12b commented 9 years ago

    On systems where configure is unable to find langinfo.h (or where nl_langinfo() is not defined), configure undefines HAVE_LANGINFO_H in pyconfig.h. Then in pythonrun.c:get_locale_encoding() the call to nl_langinfo() is wrapped in an #ifdef, but the #else path on the ifdef does a PyErr_SetNone(PyExc_NotImplementedError) and returns NULL, which causes initfsencoding() to fail with the message "Py_Initialize: Unable to get the locale encoding", which causes the interpreter to abort.

    I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" (I'm not sure which). But maybe that was for a different part of the interpreter?

    In any case there are 4 choices here, all of which are preferable to what we are doing now.

    1. Fail during configure. If we can't even start the interpreter, then why waste the users time with the build?
    2. Fail during compilation. The #else path could contain #error "Python only works on systems where nl_langinfo() is correctly implemented." Again, this would be far preferable to failing only once the user has finished the install and tries to get the interpreter prompt.
    3. Implement our own python_nl_langinfo() that we fall back on when the system one doesn't exist. (It could, for example, return "ASCII" (or "ANSI_X3.4-1968") to start with, and "UTF-8" after we see a call to setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "").
    4. just return the string "ASCII".

    The attached patch does the last. I'm willing to try to write the patch for choice (3) if that's what you'd prefer. (I have an implementation that does (3) for systems that also don't have setlocale() implemented, but I don't yet know how to do it if nl_langinfo() doesn't exist but setlocale() does.)

    vstinner commented 9 years ago

    I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8"

    It's very important than Py_DecodeLocale and Py_EncodeLocale use the same encoding than sys.getfilesystemencoding().

    What is your platform? Which encoding is used by these functions?

    6cde08a7-edea-48aa-abe4-0dd06e8ad12b commented 9 years ago

    My platform is the Android command-line shell. Essentially it is like an embedded linux platform with a very quirky partially implemented libc (not glibc). It has no langinfo.h and while it has locale.h, the implementations of setlocale() and localeconv() do nothing (and return null). The wcstombs() and mbstowcs() functions are both mapped to strncpy().

    As was the original intent of utf-8, since the Linux kernel (and most supported file systems) store filenames as null-terminated byte strings, utf-8 encoded file names "work" with software that assumes that the encoding is utf-8 (for example the xterm program that I'm using to "ssh" into the machine) (for another example, the Dalvik JVM that runs user-apps.)

    My intent with this tracker is to make it slightly easier for people who have libc like Android where the locale support is completely broken and really only 8-bit "ascii" is supported to get something reasonable to compile and run, while simultaneously not breaking the supported platforms.

    If you look at what Kivy and Py4A have done, they basically have patches all over the main interpreter that, once applied, make the interpreter not work on any supported platform. I'm trying to avoid that approach. Two possibilities for this particular part of the interpreter are to implement option (3) above, or to implement option (4) above. Option (3) is preferable in the long run, but option(4) is a much smaller change (as long as it does consistently with the decision of tracker 8610.)

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

    Has anyone made an effort to get this fixed in Android? I find it strange that hundreds of projects now work around Android bugs instead of putting (friendly) pressure on the Android maintainers.

    Minimal langinfo.h and locale.h support should be trivial to implement.

    6cde08a7-edea-48aa-abe4-0dd06e8ad12b commented 9 years ago

    I am working on using my resources at Intel to put some pressure on Google to fix some of the (many) problems in the Bionic libc.

    I have a sort of "polyfill" library that implements locale.h, langinfo.h, as well as the structure definitions for wchar.h, and it borrows the utf8 mbs*towcs() and wcs*tombs() implementations from FreeBSD. It implements a setlocale() and nl_langinfo() that starts in locale "C", fakes it as though the user's envvars are set to "C.UTF-8" (so if you call setlocale(LC_ALL, "") the encoding is changed to UTF-8).

    But Bionic has been broken for many years, and it will most likely take many more years before I (or somebody) can arrange the right set of things to get it fixed. It is not really in Google's interest to have people writing non-JVM code, so they seem to only grudgingly support it, their JVM APIs are the "walled garden" that keeps apps sticky to their platform, while allowing them to quickly switch to new processor architectures if they need to.

    But all of that is not really germane to this bug. The fact is that cpython, when compiled for a system with no langinfo.h creates an executable that does nothing but crash.

    What other systems (other than Android) have no langinfo.h? (Alternatively, why has this feature-test been in configure.ac for many years?) If the solution for Android is "it's android's bug and they should fix it" then shouldn't we remove all the #ifdef HAVE_LANGINFO_H tests from the code and just let compilation fail on systems that don't have langinfo.h? That is option (1) or (2) that I suggested above.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

    To expand a little, here ...

    https://code.google.com/p/android/issues/list

    ... I cannot find either a localeconv() or an nl_langinfo() issue.

    Perhaps the maintainers would be willing to add minimal versions?

    vstinner commented 9 years ago

    If the platform doesn't provide anything, we can maybe adopt the same approach than Mac OS X: force the encoding to UTF-8 and just don't use the C library.

    490c593f-f636-409f-bb35-6abeb38a4595 commented 8 years ago

    Android default system encoding is UTF-8 as specified at http://developer.android.com/reference/java/nio/charset/Charset.html

    \<quote>The platform's default charset is UTF-8. (This is in contrast to some older implementations, where the default charset depended on the user's locale.) \</quote>

    If the platform doesn't provide anything, we can maybe adopt the same approach than Mac OS X: force the encoding to UTF-8 and just don't use the C library.

    The attached patch does the same thing as proposed by Victor but emphasizes that Android does not HAVE_LANGINFO_H and does not have CODESET. And the fact that HAVE_LANGINFO_H and CODESET are not defined causes other problems (maybe as well in Mac OS X). In that case, PyCursesWindow_New() in _cursesmodule.c falls back nicely to "utf-8", but _Py_device_encoding() in fileutils.c instead does a Py_RETURN_NONE. It seems that this impacts _io_TextIOWrapperinitimpl() in textio.c and os_device_encoding_impl() in posixmodule.c. And indeed, os.device_encoding(0) returns None on android.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 8 years ago

    New changeset ad6be34ce8c9 by Stefan Krah in branch 'default': Issue bpo-22747: Workaround for systems without langinfo.h. https://hg.python.org/cpython/rev/ad6be34ce8c9

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 8 years ago

    We don't support Android officially yet, but I think until bpo-8610 is resolved something must be done here.

    vstinner commented 5 years ago

    Python 3 (I don't recall which version exactly) has been fixed to always use UTF-8 on Android for the filesystem encoding and even for the locale encoding in most places. I close the issue.