python / cpython

The Python programming language
https://www.python.org
Other
63.2k stars 30.27k forks source link

curses crash on FreeBSD #51633

Closed mdickinson closed 14 years ago

mdickinson commented 14 years ago
BPO 7384
Nosy @akuchling, @mdickinson, @vstinner, @ashemedai, @bitdancer, @skrah
Files
  • freebsd-curses.diff: Possible fix
  • issue7384.patch
  • issue7384-2.patch
  • issue7384-3-py3k.patch
  • issue7384-4-py3k.patch
  • issue7384-5-py3k.patch
  • issue7384-5-trunk.patch
  • ldd-retval-py3k.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules', 'type-bug'] title = 'curses crash on FreeBSD' updated_at = user = 'https://github.com/mdickinson' ``` bugs.python.org fields: ```python activity = actor = 'skrah' assignee = 'none' closed = True closed_date = closer = 'skrah' components = ['Extension Modules'] creation = creator = 'mark.dickinson' dependencies = [] files = ['16935', '16963', '16973', '17023', '17050', '17064', '17528', '17997'] hgrepos = [] issue_num = 7384 keywords = ['patch', 'buildbot'] message_count = 52.0 messages = ['95652', '97709', '97722', '99657', '99658', '99659', '103231', '103256', '103261', '103263', '103264', '103265', '103267', '103295', '103307', '103308', '103393', '103394', '103395', '103429', '103432', '103497', '103503', '103828', '103838', '103980', '103996', '103997', '104000', '104002', '104054', '104057', '104070', '104071', '104074', '104283', '104302', '104311', '104315', '106199', '106939', '106940', '106948', '107323', '107999', '110222', '110224', '110225', '110238', '110271', '110378', '110550'] nosy_count = 8.0 nosy_names = ['akuchling', 'mark.dickinson', 'vstinner', 'asmodai', 'rpetrov', 'Arfrever', 'r.david.murray', 'skrah'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue7384' versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2'] ```

    mdickinson commented 14 years ago

    test_curses is currently causing the test runs to abort on the FreeBSD 6.4 and 7.2 buildbots.

    I can reproduce this on a FreeBSD 7.2 /amd64 machine by doing

    ./python Lib/test/regrtest.py -uall test_all test_curses

    This dumps core, and the traceback points at the call to delwin() in PyCursesWindow_Dealloc, but it's far from obvious (to me) what's going wrong. wo->win is not NULL here, and appears to point to a valid WINDOW.
    However, stdscr is NULL! As far as I can tell, this shouldn't happen.

    test_curses by itself doesn't crash, unless I add an 'import readline' or 'import rlcompleter' to the top of test_curses.py.

    I expect to have access to the FreeBSD machine for a few more days. Any hints about what to try next would be appreciated.

    mdickinson commented 14 years ago

    I've not had any success tracking the cause of this failure down, and no longer have the resources to do so. It does appear that curses itself is broken on FreeBSD: it's not just a problem with the tests.

    Adding Andrew Kuchling to the nosy in case he has any ideas what's wrong here.

    Since the test_curses crash is currently aborting the test run, and so preventing us from getting feedback from the other tests on the FreeBSD buildbots, I propose that test_curses be skipped with a "the curses module is broken on FreeBSD" message.

    bitdancer commented 14 years ago

    Given your diagnosis so far, +1 on the skip.

    mdickinson commented 14 years ago

    It does appear that curses itself is broken on FreeBSD

    Rereading this, it doesn't say what I meant it to say: I meant that the Python curses module seems to be broken, not that the system-level curses library is broken (though that seems possible too).

    mdickinson commented 14 years ago

    Applied the test_curses skip in r78281 (trunk); will merge to the other branches.

    Leaving this issue open, since the root cause isn't fixed.

    mdickinson commented 14 years ago

    Merged to the other 3 branches in revisions r78282 (release26-maint), r78283 (py3k), r78284 (release31-maint).

    mdickinson commented 14 years ago

    I'm looking at this again, after installing FreeBSD 8.0/amd64 in a VM.

    I've reduced Lib/test/test_curses.py to the following 9 lines:

    import rlcompleter
    import curses
    f = open('mytempfile', 'w+b')
    stdscr = curses.initscr()
    stdscr.putwin(f)
    f.seek(0)
    curses.getwin(f)
    f.close()
    curses.endwin()

    I then get:

    $ ./python Lib/test/regrtest.py test_curses
    test_curses
    Bus error (core dumped)

    From looking at the core dump, and tracing through with gdb, the core dump occurs when delwin is called (from PyCursesWindow_Dealloc) on the result of curses.getwin(f), as a result of garbage collection.

    The 'import rlcompleter' line appears to be necessary to cause this; I've no idea why.

    mdickinson commented 14 years ago

    Here's the top of the backtrace. (Thanks asmodai for helping me out with working out how to build a FreeBSD system ncurses with debugging information.)

    0 0x0000000801460714 in cannot_delete (win=0x80116b1d0)

    at /usr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_delwin.c:54
        p = (struct \_win_list \*) 0xdbdbdbdbdbdbdbdb
        result = false

    1 0x0000000801460773 in delwin (win=0x80116b1d0)

    at /usr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_delwin.c:71
        result = -1

    2 0x000000080170d140 in PyCursesWindow_Dealloc (wo=0x800eb74c0)

    at /usr/home/dickinsm/python/svn/trunk/Modules/_cursesmodule.c:357

    No locals.

    3 0x000000000046325f in _Py_Dealloc (op=0x800eb74c0) at Objects/object.c:2211

        dealloc = 0x80170d110 \<PyCursesWindow_Dealloc\>

    4 0x00000000004578d8 in PyDict_DelItem (op=0x800f121b0, key=0x8011062e0)

    at [Objects/dictobject.c:829](https://github.com/python/cpython/blob/main/Objects/dictobject.c#L829)
        mp = (PyDictObject \*) 0x800f121b0
        hash = -3668919459648339544
        ep = (PyDictEntry \*) 0x8010cb5a8
        old_value = (PyObject \*) 0x800eb74c0
        old_key = (PyObject \*) 0x8011062e0
        \_\_func__ = "PyDict_DelItem"

    5 0x0000000000458a48 in dict_ass_sub (mp=0x800f121b0, v=0x8011062e0, w=0x0)

    ---Type \<return> to continue, or q \<return> to quit--- at Objects/dictobject.c:1184 No locals.

    6 0x000000000041aadd in PyObject_DelItem (o=0x800f121b0, key=0x8011062e0)

    at [Objects/abstract.c:205](https://github.com/python/cpython/blob/main/Objects/abstract.c#L205)
        m = (PyMappingMethods \*) 0x6c2960
    akuchling commented 14 years ago

    Could I get a login on the buildbot to make a fix?

    I bet the problem is with the stdscr object. PyCurses_InitScr() does 'return (PyObject *)PyCursesWindow_New(stdscr);'.

    PyCursesWindow_Dealloc() does: if (wo->win != stdscr) delwin(wo->win);

    I bet FreeBSD is clearing contents of the stdscr global variable. The condition in PyCursesWindow_Dealloc() is then true, and it tries to delwin() the old value, which is in wo->win.

    One fix might be to keep a reference to that PyCursesWindow object holding stdscr, and change dealloc to 'if (wo != saved_stdscr_object)'. Or maybe, since multiple calls to initscr() will create multiple window objects holding the value of stdscr, window objects should have a 'do_not_delwin' flag.

    akuchling commented 14 years ago

    Here's a possible patch; it at least doesn't seem to break the module on MacOS, though MacOS doesn't crash with the current code either.

    mdickinson commented 14 years ago

    Could I get a login on the buildbot to make a fix?

    I think David Bolen (db3l) is the maintainer. David?

    mdickinson commented 14 years ago

    Here's a possible patch

    Thanks. I'll give it a try on my FreeBSD VM and report back. BTW, did you mean to include the threading change in that patch?

    mdickinson commented 14 years ago

    With that patch, I'm still getting the core dump (with the traceback looking pretty much as it did before).

    When I traced through this with gdb, I didn't see stdscr getting set to 0 at any point. Unless I missed any, the only curses library calls made (in sequence) were:

    1. initscr() -> new window win (=stdscr, presumably)
    2. putwin(file, win)
    3. getwin(file) -> new window win2, with win2 != win
    4. freewin(win2) -> segfault --- and presumably without the segfault, there would have been calls to freewin(win) and endwin() too.

    And I'm at a complete loss to explain why importing rlcompleter makes a difference. (importing readline also causes the segfault). I don't think it's just to do with random memory changes, since if I replace the readline or rlcompleter import by any other randomly chosen python module then there's no segfault.

    c379e6df-6daa-45d5-a8ee-f828d234d3ca commented 14 years ago

    For the record, this happens on FreeBSD 8 as well.

    It seems it is still the same bug as what I reported back in March 2009 on the Python-dev list.

    If you run the test stand-alone with ./python Lib/test/regrtest.py -uall test_curses it passes and prints "1 test OK".

    If you add something like testall before it it will crash with a SIGSEGV: segmentation fault (core dumped).

    Mark's condensed test case switches to a SIGBUS, which is a bit different.

    Mark, did your initial backtrace look like this:

    0 0x282e115e in memcpy () from /lib/libc.so.7

    1 0x282de375 in fwrite () from /lib/libc.so.7

    2 0x282de132 in fwrite () from /lib/libc.so.7

    3 0x28b7a1ca in putwin (win=0x28409640, filep=0x282f39f8)

    at /newusr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_screen.c:132

    4 0x28d9b361 in PyCursesWindow_PutWin (self=0x28442ef0, args=0x2867f80c)

    at /home/asmodai/projects/python/Modules/_cursesmodule.c:1351

    5 0x080da60d in PyEval_EvalFrameEx (f=0x296d760c, throwflag=0)

    at [Python/ceval.c:4013](https://github.com/python/cpython/blob/main/Python/ceval.c#L4013)

    6 0x080db10e in PyEval_EvalFrameEx (f=0x296a948c, throwflag=0)

    at [Python/ceval.c:4099](https://github.com/python/cpython/blob/main/Python/ceval.c#L4099)

    7 0x080db10e in PyEval_EvalFrameEx (f=0x29692d8c, throwflag=0)

    at [Python/ceval.c:4099](https://github.com/python/cpython/blob/main/Python/ceval.c#L4099)

    8 0x080dc68b in PyEval_EvalCodeEx (co=0x297675c0, globals=0x2866bbdc,

    locals=0x2866bbdc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0,
    defcount=0, closure=0x0) at [Python/ceval.c:3253](https://github.com/python/cpython/blob/main/Python/ceval.c#L3253)

    9 0x080dc7d7 in PyEval_EvalCode (co=0x297675c0, globals=0x2866bbdc,

    locals=0x2866bbdc) at [Python/ceval.c:666](https://github.com/python/cpython/blob/main/Python/ceval.c#L666)

    10 0x080ef70c in PyImport_ExecCodeModuleEx (

    name=0xbfbfd683 "test.test_curses", co=0x297675c0,
    pathname=0xbfbfd223 "/home/asmodai/projects/python/Lib/test/test_curses.py")
    mdickinson commented 14 years ago

    Mark, did your initial backtrace look like this:

    No; the segfault was definitely happening in delwin rather than putwin. But I did see something like your backtrace when I tried to use ncurses from ports (installed in /usr/local) rather than the system ncurses. This was all on FreeBSD 8.0/amd64, by the way, running in a VM on Parallels. I got the same results both when working directly within the VM terminal, and when ssh'ing to the VM from an OS X Terminal.

    Maybe running this through Valgrind or something similar might show what's going on. (Though it's not clear from a quick google whether Valgrind works on FreeBSD.)

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Valgrind can be installed by:

    cd /usr/ports/devel/valgrind && make install

    Then you can do (curses_test.py is your short test program):

    1) valgrind --db-attach=yes --suppressions=Misc/valgrind-python.supp ./python curses_test.py

    2) valgrind --suppressions=Misc/valgrind-python.supp ./python curses_test.py

    Valgrind finds invalid writes. The problem with 1) is that the terminal is in an unusable state, so controlling gdb isn't possible.

    The best thing is probably to use 2) and wade through the unformatted output starting here:

    ==12043== Invalid write of size 8 ==12043== at 0x27A71B7: getwin (in/li /libncursesw.so.8) ==12043== by 0x2A3EAAB: PyCurses_GetWin (_cursesmodule.c:1902) ==12043== by 0x4573FB: PyEval_EvalFrameEx (ceval.c:3833) ==12043== by 0x457DF9: PyEval_EvalCodeEx (ceval.c:3282)

    (I don't have time to do that right now, I might do it later.)

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    One oddity: In Mark's test case, the error only shows if readline is imported _before_ curses. The other way around it's fine.

    On FreeBSD 8.0 amd64, with the _default_ libcurses, the Valgrind output for py3k looks like this:

    [...] ==31089== Invalid write of size 8 ==31089== at 0x284F1AE: getwin (in /lib/libncursesw.so.8) ==31089== by 0x2AE8532: PyCurses_GetWin (_cursesmodule.c:1903) ==31089== by 0x47FBC7: call_function (ceval.c:3833) ==31089== by 0x47AAC0: PyEval_EvalFrameEx (ceval.c:2645) ==31089== by 0x47DF41: PyEval_EvalCodeEx (ceval.c:3282) ==31089== by 0x47189F: PyEval_EvalCode (ceval.c:721) ==31089== by 0x4B31AA: run_mod (pythonrun.c:1692) ==31089== by 0x4B2FC3: PyRun_FileExFlags (pythonrun.c:1649) ==31089== by 0x4B1734: PyRun_SimpleFileExFlags (pythonrun.c:1177) ==31089== by 0x4B0C75: PyRun_AnyFileExFlags (pythonrun.c:963) ==31089== by 0x4CB029: Py_Main (main.c:650) ==31089== by 0x4150E4: main (python.c:152) ==31089== Address 0x25c71e0 is 0 bytes after a block of size 112 alloc'd ==31089== at 0x25A8AE: calloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==31089== by 0x29C518A: _nc_makenew (in /lib/libncurses.so.8) ==31089== by 0x29C569F: newwin (in /lib/libncurses.so.8) ==31089== by 0x284F2EE: getwin (in /lib/libncursesw.so.8) ==31089== by 0x2AE8532: PyCurses_GetWin (_cursesmodule.c:1903) ==31089== by 0x47FBC7: call_function (ceval.c:3833) ==31089== by 0x47AAC0: PyEval_EvalFrameEx (ceval.c:2645) ==31089== by 0x47DF41: PyEval_EvalCodeEx (ceval.c:3282) ==31089== by 0x47189F: PyEval_EvalCode (ceval.c:721) ==31089== by 0x4B31AA: run_mod (pythonrun.c:1692) ==31089== by 0x4B2FC3: PyRun_FileExFlags (pythonrun.c:1649) ==31089== by 0x4B1734: PyRun_SimpleFileExFlags (pythonrun.c:1177) ==31089== [...]

    Then I installed the curses from /usr/ports/devel/ncurses, and the error didn't show up any more. I'm inclined to think that the bug is in the system ncurses. Still, it would be nice to know why the import order matters.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    I take that back. With the curses from /usr/ports/devel/ncurses, Mark's test case is fine, but

    ./python Lib/test/regrtest.py -uall test_curses

    fails again.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Alas, after installing curses from /usr/ports/devel/ncurses I did not recompile Modules/_curses_panel.c.

    So, after a proper build

    ./python Lib/test/regrtest.py -uall test_curses

    shows no errors.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    It seems that FreeBSD has problems with the fact that readline.so is linked with -lreadline and -lncursesw (why?).

    With bpo-7384.patch I get no more errors using either Mark's test case or test_curses.py.

    mdickinson commented 14 years ago

    That patch works for me, too. Nice!

    It seems that FreeBSD has problems with the fact that readline.so is linked with -lreadline and -lncursesw (why?).

    Good question...

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    To clarify a couple of things:

    On some systems (Redhat?) readline is not linked against ncurses in order to give the user the possibility to choose. This is why setup.py has to select an ncurses version.

    However, things can go wrong if readline is already linked against a specific ncurses version. On FreeBSD-8.0 this version is ncurses, but setup.py selects ncursesw:

    stefan@freebsd-amd64:~> ldd /lib/libreadline.so.8 /lib/libreadline.so.8: libncurses.so.8 => /lib/libncurses.so.8 (0x800b3e000) libc.so.7 => /lib/libc.so.7 (0x800648000) stefan@freebsd-amd64:~> ls /lib/libncurses* /lib/libncurses.so.8 /lib/libncursesw.so.8

    bpo-7384.patch suppresses the selection, but is a little primitive.

    I've created a new patch, which does the following:

    1) Detect if readline is already linked against ncurses and if so, skip any further selection. This must be done.

    2) Use the same version of ncurses for readline.so and _curses.so.

    I'm not sure if 2) is necessary. With the previous patch, readline.so was linked against ncurses and _curses.so against ncursesw. All tests were passed though.

    Any thoughts whether readline.so and _curses.so should link against the same curses library?

    c379e6df-6daa-45d5-a8ee-f828d234d3ca commented 14 years ago

    Just to state the obvious: ncursesw is needed for wide character support (i.e. Unicode).

    Also, have you tried asking Thomas Dickey (dickey@invisible-island.net) about this? He might be able to give some clue about it since he's the main curses maintainer.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Jeroen, thanks for the idea. I asked Thomas Dickey and he said that one should not load both libncurses.so and libncursesw.so.

    I think this means that if libreadline.so is already linked against libncurses.so, we are stuck with libncurses.so for the curses module.

    If this affects users who want the wide character version, they could file a bug report with their distro:

    Thomas Dickey pointed out that there are two ways for a distro to deal with this problem:

    1) Link libreadline against ncursesw.

    2) Split out the termcap interface (which readline uses) as libtinfo. This is a configure option for ncurses and SuSE and Redhat are doing this.

    I'm attaching a new patch against py3k that makes sure that the readline and curses modules use the same curses library.

    (This does not apply to Darwin, but I don't want to touch that logic.)

    I'm going to test the patch on py3k-cdecimal to see if it works on the buildbots.

    mdickinson commented 14 years ago

    This patch looks good to me, assuming that the buildbots are happy. I agree that this seems like a sensible solution for now, even if it means limiting users to ncurses rather than ncursesw.

    I was initially a bit surprised that it works on OS X, since OS X doesn't have 'ldd'; but in that case the os.system call simply outputs "sh: ldd: command not found" to stderr and (presumably) nothing to stdout; no Python exception is raised, so it's all okay. It might be worth adding code to avoid the os.system('ldd ...') call on OS X, just to avoid the unnecessary error message on the console. Apart from this, I say +1 to applying the patch.

    Many thanks for all the detective work!

    d8d5aad8-e55b-4500-a3a0-9ea982d771ff commented 14 years ago

    Instead to test in setup.py we could use result from configure script - just uncomment line and use it

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Mark, thanks for reviewing the patch. In the new patch, I added a skip for OS X.

    Buildbot testing looks good. In particular, one FreeBSD bot passes test_curses now (the other one is hanging in multiprocessing).

    For most bots nothing changes. The solaris bot has the same unrelated failures as before. Ubuntu sparc previously did the same weird linking (readline already linked with ncurses, but using -lncursesw) and now uses ncurses throughout. Tests pass. Debian sparc did the same, tests give the same failures as before ("getmouse returned ERR", almost certainly unrelated.)

    Roumen, I do not see a line in configure.in that tests for the libraries that readline is linked against.

    c379e6df-6daa-45d5-a8ee-f828d234d3ca commented 14 years ago

    I did some digging on my side, the fact you see ncurses referenced from readline is due to the build linking readline to libtermcap:

    cc -fstack-protector -shared -Wl,-x -o libreadline.so.8 -Wl,-soname,libreadline.so.8 lorder readline.So vi_mode.So funmap.So keymaps.So parens.So search.So rltty.So complete.So bind.So isearch.So display.So signals.So util.So kill.So undo.So macro.So input.So callback.So terminal.So text.So nls.So misc.So compat.So xmalloc.So history.So histexpand.So histfile.So histsearch.So shell.So mbutil.So tilde.So | tsort -q -ltermcap

    And libtermcap is:

    % ll /usr/lib/libtermcap.so* 0 lrwxr-xr-x 1 root wheel - 13B 18 apr 08:29 /usr/lib/libtermcap.so@ -> libncurses.so

    That configuration option you referenced, Stefan, is that --with-termlib (generate separate terminfo library)?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Yes, readline uses only the termcap part of ncurses. I think that --with-termlib is the correct option, see:

    http://www.mail-archive.com/util-linux-ng@vger.kernel.org/msg00273.html

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Actually this means that we should also look for -ltinfo in the ldd check (A Redhat buildbot would be nice).

    d8d5aad8-e55b-4500-a3a0-9ea982d771ff commented 14 years ago

    Roumen, I do not see a line in configure.in that tests for the libraries that readline is linked against.

    The test in configure is how to link application to readline libs.

    Platforms that support linking of shared libraries with unresolved symbols cannot link readline to termcap compatible library if they offer more then one. I think that this is the bug in package build on those system as this limit applications to use other termcap libraries.

    Not all linux link readline to termcap compatible library:

    As configure detect how to link readline we could uncomment READLINE_LIBS and to add as makefile macroand to use by setup.py. If READLINE_LIBS contain only -lreadline => on this platform readline is already linked to termcap compatible library.

    Also detection of dependent libraries that use ldd is limited to platforms that has this command, i.e. is not portable. If distutils support a method that return dependency libraries we could use. (

    I'm not familiar with python curses module to propose a patch . Is possible to to run sample program to detect readline curses library ?

    Or may be to try to link sample "int main() { readline(); }" and to ask compiler/linker to warn for duplicate symbols. Something like : $ gcc -Wl,--warn-common test-readline.c -lreadline -lncursesw -lncursesw $ gcc -Wl,--warn-common test-readline.c -lreadline -ltermcap -lncurses .../libncurses.so: warning: common of ospeed' overridden by larger common .../libtermcap.so: warning: larger common is here $ gcc -Wl,--warn-common test-readline.c -lreadline -ltermcap -lncursesw ..../libncursesw.so: warning: common ofospeed' overridden by larger common ..../../libtermcap.so: warning: larger common is here FIXME with more portable and more correct command.

    Roumen

    d8d5aad8-e55b-4500-a3a0-9ea982d771ff commented 14 years ago

    Stefan Krah wrote:

    Stefan Krah\stefan-usenet@bytereef.org\ added the comment:

    Actually this means that we should also look for -ltinfo in the ldd check (A Redhat buildbot would be nice).

    Or may be this mean that in configure to add test with -ltinfo and if readline link succeed then is save to link python curses module with first curses library found.

    ldd - what about platforms without GNU libc ?

    Roumen

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    I included the test for libtinfo in the latest patch. The patch is tested on Fedora and correctly links the curses module with -lncursesw.

    This means that the ldd method works on all buildbots, OpenBSD, OpenSolaris and Fedora.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    I'm not against sorting things out in configure.in, but I'm not quite sure that it will be more portable than ldd:

    On FreeBSD (the problem system!) I can't get this to work:

    [stefan@freebsd-i386 \~]$ echo 'int main() { readline(); }' > test_readline.c [stefan@freebsd-i386 \~]$ gcc -Wl,--warn-common xxx.c -lreadline -ltermcap -lncurses -lncursesw [stefan@freebsd-i386 \~]$ gcc -Wl,--warn-common xxx.c -lreadline -lncurses -lncursesw [stefan@freebsd-i386 \~]$ gcc -Wl,--warn-common xxx.c -lreadline -lncursesw

    On OpenSolaris with suncc, ld does not have -warn-common.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Sigh. xxx.c == test_readline.c in the previous comment.

    d8d5aad8-e55b-4500-a3a0-9ea982d771ff commented 14 years ago

    Yes , I understand . For the protocol did gcc on FreeBSD warn if library order is -lncursesw -lreadline ? Forget for

    Also I'm not able to write C test case similar to python msg103231 by Mark Dickinson that fail on system where readline library is not linked to ncurses. Always program work and didn't code dump(=bus error) nevertheless order of ncurses (with w and without w suffix) and readline libraries.

    So if there is no way to write C test program that fail I could not see ather way to detect issue except to parse result from programs that output library dependencies. Also I expect this to fail for static build (--disable-shared). I'm not sure that readline library work well with static builds - but this is another issue and my time machine is stop working :) .

    To write script that check platform and if is freebsd, suse link with a, b, c if os is XX link with d, e, f will work with shared and static build - It is not reasonable solution :(

    P.S. Issue with readline library linked to termcap compatible library on system that distribute more then one termcap compatible library is about 10 years old.

    Roumen

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Roumen Petrov \report@bugs.python.org\ wrote:

    Yes , I understand . For the protocol did gcc on FreeBSD warn if library order is -lncursesw -lreadline ?

    No.

    P.S. Issue with readline library linked to termcap compatible library on system that distribute more then one termcap compatible library is about 10 years old.

    I didn't want to touch the termcap logic. There's potential for breakage, and a real investigation would be time consuming.

    (There's a needless warning on Tiger about /usr/lib/termcap that could be fixed in another issue.)

    c379e6df-6daa-45d5-a8ee-f828d234d3ca commented 14 years ago

    Stefan, I was emailing with Rong-En Fan, a FreeBSD committer, about this issue and he asked:

    "Basically, this is caused by

    a) our readline.so is linked against ncurses.so (via -ltermcap which is the same lib) b) wide-character enabled ncurses, ncursesw.so, is also loaded in the same process

    To solve that, we need to have a separate termcap.so, do I understand the issue correctly?"

    He also mentioned that "[a]nother more aggressive way is to make only ncursesw installed into the system which requires a recompilation of all ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't assume size of ncurses structures)."

    Which I fully support, it's something that I did on DragonFly BSD a long time ago already (for all I can remember).

    Your opinion?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Jeroen Ruigrok van der Werven \report@bugs.python.org\ wrote:

    Stefan, I was emailing with Rong-En Fan, a FreeBSD committer, about this issue and he asked:

    "Basically, this is caused by

    a) our readline.so is linked against ncurses.so (via -ltermcap which is the same lib) b) wide-character enabled ncurses, ncursesw.so, is also loaded in the same process

    To solve that, we need to have a separate termcap.so, do I understand the issue correctly?"

    Yes, only that the separate termcap is called libtinfo.so. The approach of splitting out libtinfo from ncurses (used by Fedora) is the most flexible and allows the user to choose ncurses or ncursesw.

    [stefan@fedora-amd64 \~]$ ldd /lib64/libreadline.so.6.0 linux-vdso.so.1 => (0x00007fff725ff000) libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00000036e4a00000) libc.so.6 => /lib64/libc.so.6 (0x00000036d9600000) /lib64/ld-linux-x86-64.so.2 (0x00000036d9200000)

    +ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't +assume size of ncurses structures)."

    Which I fully support, it's something that I did on DragonFly BSD a long time ago already (for all I can remember).

    Your opinion?

    I think the libtinfo approach is more flexible, and I'm not aware of any drawbacks. So, for FreeBSD, I'd use it.

    Stefan Krah

    vstinner commented 14 years ago

    I tested bpo-7384-5-py3k.patch on FreeBSD 8.0: it fixes the crash.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    I think it would be nice to get this into 2.7. I don't expect buildbot failures, since the 2.7 patch is essentially the same as the py3k version, which has been tested extensively.

    mdickinson commented 14 years ago

    I think it would be nice to get this into 2.7.

    Agreed. I think you should go ahead and commit it.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Mark, thanks. Committed in r81669; I'll keep an eye on the buildbots.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Committed in r81669,r81672,r81683 (trunk) and r81830,81831 (py3k).

    What to do with the releases? To recap, the fix is:

    1) Detect if readline is already linked against ncurses and if so, skip any further selection. This must be done.

    2) Use the same version of ncurses for readline.so and _curses.so.

    1) should be done in any case. 2) could change the behavior for users who previously had readline/ncurses, cursesmodule/ncursesw, but only use the cursesmodule in an application.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Committed a conservative version implementing part 1) in r82017 (2.6) and r82019 (3.1). Part 2) can be enabled by uncommenting a couple of lines in setup.py.

    The buildbots look good, but I'm setting this to 'pending' in case someone would like part 2) of the fix in the releases.

    80036ac5-bb84-4d39-8416-02cd8e51707d commented 14 years ago

    These changes break building of Python 3.* in some locales in Gentoo.

    running build
    running build_ext
    Traceback (most recent call last):
      File "./setup.py", line 1812, in <module>
        main()
      File "./setup.py", line 1807, in main
        "Tools/scripts/2to3"]
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/core.py", line 152, in setup
        dist.run_commands()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 946, in run_commands
        self.run_command(cmd)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
        cmd_obj.run()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build.py", line 127, in run
        self.run_command(cmd_name)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/cmd.py", line 315, in run_command
        self.distribution.run_command(command)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
        cmd_obj.run()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build_ext.py", line 393, in run
        self.build_extensions()
      File "./setup.py", line 151, in build_extensions
        missing = self.detect_modules()
      File "./setup.py", line 539, in detect_modules
        for ln in fp:
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 20: ordinal not in range(128)
    make: *** [sharedmods] Error 1

    In lt_LT.UTF-8 locale, readline_termcap_lib file contains: ne dinaminis paleidžiamasis failas

    In en_US.UTF-8 locale, this file would contain: not a dynamic executable

    do_readline is "/usr/lib64/libreadline.so".

    /usr/lib64/libreadline.so is a linker script with the following content: /* GNU ld script Since Gentoo has critical dynamic libraries in /lib, and the static versions in /usr/lib, we need to have a "fake" dynamic lib in /usr/lib, otherwise we run into linking problems. This "fake" dynamic lib is a linker script that redirects the linker to the real lib. And yes, this works in the cross- compiling scenario as the sysroot-ed linker will prepend the real path.

    See bug http://bugs.gentoo.org/4411 for more info. */ OUTPUT_FORMAT ( elf64-x86-64 ) GROUP ( /lib64/libreadline.so.6 )

    I think that using ldd is a wrong idea.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    In Ubuntu I can build just fine with lt_LT.UTF-8. So perhaps this problem should be addressed in Gentoo.

    80036ac5-bb84-4d39-8416-02cd8e51707d commented 14 years ago

    You shouldn't use ldd. I suggest that setup.py try to link a small executable, which would use a function from libcurses and would be linked against libreadline, but not libcurses. If linking succeeds, then you libreadline is linked against libcurses. If linking fails, then repeat this procedure with libcursesw, libncurses, libncursesw, libtinfo.

    vstinner commented 14 years ago

    "In lt_LT.UTF-8 locale, readline_termcap_lib file contains: ne dinaminis paleidžiamasis failas"

    You can run ldd without LANG variable to get the original (english, ascii only) message.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    So you have garbage from stderr in readline_termcap_lib. Since that's useless anyway (no matter what locale is set), let's check the return value of os.system().

    The attached patch skips readline linkage detection if ldd fails. In that case, linking will be done in the same manner as before r81830.

    Please report if the patch allows you to build py3k in the problematic locale.

    Your method of detecting readline linkage looks interesting, but I doubt that I'm going to implement it: These cross platform issues take an *immense* amount of time, since you have to test on all buildbot platforms (+ OpenBSD and OpenSolaris), with different compilers (icc, suncc).

    If you want that done, the best way is to open another issue, submit a patch (probably for configure.in) _and_ do all the testing.