python / cpython

The Python programming language
https://www.python.org
Other
63.2k stars 30.26k forks source link

Bus error on Debian sparc #59794

Closed 5531d0d8-2a9c-46ba-8b8b-ef76132a492c closed 12 years ago

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago
BPO 15589
Nosy @loewis, @birkenfeld, @vstinner, @larryhastings, @ned-deily, @skrah
Files
  • larry.force.alignment.in.capi.test.1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-crash'] title = 'Bus error on Debian sparc' updated_at = user = 'https://github.com/skrah' ``` bugs.python.org fields: ```python activity = actor = 'skrah' assignee = 'none' closed = True closed_date = closer = 'skrah' components = [] creation = creator = 'skrah' dependencies = [] files = ['26727'] hgrepos = [] issue_num = 15589 keywords = [] message_count = 21.0 messages = ['167678', '167679', '167701', '167706', '167713', '167714', '167715', '167716', '167717', '167718', '167723', '167724', '167725', '167728', '167733', '167735', '167736', '167737', '167777', '167805', '168030'] nosy_count = 8.0 nosy_names = ['loewis', 'georg.brandl', 'vstinner', 'larry', 'flub', 'ned.deily', 'skrah', 'python-dev'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = 'resolved' status = 'closed' superseder = None type = 'crash' url = 'https://bugs.python.org/issue15589' versions = ['Python 3.3'] ```

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Running *any* test of the test suite currently produces a bus error on Debian sparc [http://people.debian.org/~aurel32/qemu/sparc/].

    After the bus error, the tests seem to proceed normally though.

    This is definitely new. I've been testing memoryview for bus errors a couple of months ago without problems.

    Georg, I'm provisionally setting this to release blocker. The qemu-sparc image is quite old though (Debian Etch). It's a pity we don't have a sparc buildbot any more.

    Example:

    user@debian-sparc:~/cpython$ ./python -m test -uall -v test_flufl == CPython 3.3.0b1 (default:67d36e8ddcfc+, Aug 7 2012, 23:49:57) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] Fatal Python error: Bus error

    Current thread 0x00004000: File "/home/user/cpython/Lib/subprocess.py", line 1363 in _executechild File "/home/user/cpython/Lib/subprocess.py", line 818 in \_init File "/home/user/cpython/Lib/os.py", line 995 in popen File "/home/user/cpython/Lib/platform.py", line 903 in _syscmd_uname File "/home/user/cpython/Lib/platform.py", line 1147 in uname File "/home/user/cpython/Lib/platform.py", line 1452 in platform File "/home/user/cpython/Lib/test/regrtest.py", line 537 in main File "/home/user/cpython/Lib/test/main__.py", line 13 in \<module> File "/home/user/cpython/Lib/runpy.py", line 73 in _run_code File "/home/user/cpython/Lib/runpy.py", line 160 in _run_module_as_main == Linux-2.6.18-6-sparc32-sparc-with-debian-4.0 big-endian == /home/user/cpython/build/test_python_3262 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1) [1/1] test_flufl test_barry_as_bdfl (test.test_flufl.FLUFLTests) ... ok test_guido_as_bdfl (test.test_flufl.FLUFLTests) ... ok

    ---------------------------------------------------------------------- Ran 2 tests in 0.053s

    OK 1 test OK.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Setting to critical: debian-sparc 32-bit is apparently deprecated since Lenny and still uses linuxthreads.

    Tracking down the failure could end up in finding a platform bug like in bpo-12936.

    birkenfeld commented 12 years ago

    From the position of the bus error, it would seem that calling a subprocess during platform.platform() is the culprit.

    But if test_subprocess passes without any bus errors, that would be strange.

    ned-deily commented 12 years ago

    Is it by any chance a --shared build being run from the build directory without having been installed (and without a LD_LIBRARY_PATH and with an older version already installed)?

    92935ae4-c5d3-4cd3-81e6-25bec3013308 commented 12 years ago

    Running on Solaris 10 (T1000, OpenCSW toolchain, gcc 4.6.3) I also get a bus error, with added coredump:

    $ ./python Lib/test/regrtest.py 
    == CPython 3.3.0b1 (default:67a994d5657d, Aug 8 2012, 21:43:48) [GCC 4.6.3]
    ==   Solaris-2.10-sun4v-sparc-32bit big-endian
    ==   /export/home/flub/python/cpython/build/test_python_7320
    Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1)
    [  1/369] test_grammar
    [  2/369] test_opcodes
    [  3/369] test_dict
    [  4/369] test_builtin
    [  5/369] test_exceptions
    test test_exceptions failed -- Traceback (most recent call last):
      File "/export/home/flub/python/cpython/Lib/test/test_exceptions.py", line 432, in testChainingDescriptors
        self.assertTrue(e.__suppress_context__)
    AssertionError: False is not true

    [ 6/369/1] test_types [ 7/369/1] test_unittest [ 8/369/1] test_doctest [ 9/369/1] test_doctest2 [ 10/369/1] test_support [ 11/369/1] test_all_ [ 12/369/1] testfuture [ 13/369/1] testlocale [ 14/369/1] test__osx_support [ 15/369/1] test_abc [ 16/369/1] test_abstract_numbers [ 17/369/1] test_aifc [ 18/369/1] test_argparse [ 19/369/1] test_array [ 20/369/1] test_ast [ 21/369/1] test_asynchat [ 22/369/1] test_asyncore [ 23/369/1] test_atexit [ 24/369/1] test_audioop [ 25/369/1] test_augassign [ 26/369/1] test_base64 [ 27/369/1] test_bigaddrspace [ 28/369/1] test_bigmem [ 29/369/1] test_binascii [ 30/369/1] test_binhex [ 31/369/1] test_binop [ 32/369/1] test_bisect [ 33/369/1] test_bool [ 34/369/1] test_buffer [ 35/369/1] test_bufio [ 36/369/1] test_bytes [ 37/369/1] test_bz2 [ 38/369/1] test_calendar [ 39/369/1] test_call [ 40/369/1] test_capi Fatal Python error: Bus error

    Current thread 0x00000001: File "/export/home/flub/python/cpython/Lib/test/test_capi.py", line 264 in testskipitem File "/export/home/flub/python/cpython/Lib/unittest/case.py", line 385 in _executeTestPart File "/export/home/flub/python/cpython/Lib/unittest/case.py", line 440 in run File "/export/home/flub/python/cpython/Lib/unittest/case.py", line 492 in \_call File "/export/home/flub/python/cpython/Lib/unittest/suite.py", line 105 in run File "/export/home/flub/python/cpython/Lib/unittest/suite.py", line 67 in __call File "/export/home/flub/python/cpython/Lib/unittest/suite.py", line 105 in run File "/export/home/flub/python/cpython/Lib/unittest/suite.py", line 67 in __call__ File "/export/home/flub/python/cpython/Lib/test/support.py", line 1312 in run File "/export/home/flub/python/cpython/Lib/test/support.py", line 1413 in _run_suite File "/export/home/flub/python/cpython/Lib/test/support.py", line 1447 in run_unittest File "/export/home/flub/python/cpython/Lib/test/test_capi.py", line 290 in test_main File "Lib/test/regrtest.py", line 1219 in runtest_inner File "Lib/test/regrtest.py", line 941 in runtest File "Lib/test/regrtest.py", line 714 in main File "Lib/test/regrtest.py", line 1810 in \<module> Bus Error (core dumped)

    Not sure if this should be tracked in the same issue or not?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    I think I've identified one legit Python bug. This is from a *different* traceback, i.e. the traceback in my first message is still unresolved.

    A bus error occurs in test_capi, test_skipitem with format 'D':

    Python/getargs.c:782

            Py_complex *p = va_arg(*p_va, Py_complex *);
            Py_complex cval;
            cval = PyComplex_AsCComplex(arg);
            if (PyErr_Occurred())
                RETURN_ERR_OCCURRED;
            else
                *p = cval;  <-  bus error
            break;

    The pointer p has value 0xefbfb1fc, with 0xefbfb1fc % 8 == 4. It originates from a somewhat creatively allocated memory region in _testcapi:parse_tuple_and_keywords. :)

    larryhastings commented 12 years ago

    This platform is 8-byte aligned?

    larryhastings commented 12 years ago

    nm, I get it, doubles are 8-bytes and should be 8-byte aligned. Let me stare at it some more.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Floris, the traceback in my first message only occurs in the optimized regular build with -O3. Did you try that, too?

    larryhastings commented 12 years ago

    Attached is a patch attempting to force double alignment. Stefan: please apply and try it. Does this help?

    92935ae4-c5d3-4cd3-81e6-25bec3013308 commented 12 years ago

    I compiled with a simple "./configure" which I think is what you mean (it defaults to -O3). But when executing your test it doesn't give a bus error.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Larry Hastings \report@bugs.python.org\ wrote:

    Attached is a patch attempting to force double alignment. Stefan: please apply and try it. Does this help?

    Yes, this works nicely.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 12 years ago

    New changeset efb30bdcfa1e by Larry Hastings in branch 'default': Issue bpo-15589: Ensure double-alignment for brute-force capi argument parser test http://hg.python.org/cpython/rev/efb30bdcfa1e

    92935ae4-c5d3-4cd3-81e6-25bec3013308 commented 12 years ago

    I think I can confirm this fixes the BusError. The test suite got past test_capi on my machine as well. Unfortunately I killed the ssh session by accident before the testsuite completed so I had to restart it.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    As for the original error: in test_subprocess basically every test fails. With the standard regrtest.py (faulthandler enabled), most tests generate a bus error in subprocess_fork_exec():

    621 cwd_obj2 = NULL; (gdb) 624 pid = fork(); \<- bus error (gdb) Fatal Python error: Bus error

    Current thread 0x00004000: File "/home/user/cpython/Lib/subprocess.py", line 1363 in _executechild File "/home/user/cpython/Lib/subprocess.py", line 818 in \_init__ File "/home/user/cpython/Lib/test/test_subprocess.py", line 728 in test_bufsize_is_none

    621 cwd_obj2 = NULL; (gdb) 624 pid = fork(); \<- bus error (gdb) Fatal Python error: Bus error

    Current thread 0x00004000: File "/home/user/cpython/Lib/subprocess.py", line 1363 in _executechild File "/home/user/cpython/Lib/subprocess.py", line 818 in \_init__ File "/home/user/cpython/Lib/test/test_subprocess.py", line 728 in test_bufsize_is_none

    With all faulthandler references removed from regrtest.py no bus errors happen, but most tests fail anyway. As I said, I'm NOT blaming faulthandler, but suspect some strange platform bug that perhaps involves linuxthreads.

    Since Floris can't reproduce this error, I'm setting the priority to normal.

    92935ae4-c5d3-4cd3-81e6-25bec3013308 commented 12 years ago

    I can now confirm the whole testsuite runs, so the BusError part seems fixed on my host:

    329 tests OK. 7 tests failed: test_cmd_line test_exceptions test_ipaddress test_os test_raise test_socket test_traceback 1 test altered the execution environment: test_site 32 tests skipped: test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_dbm_gnu test_epoll test_gdb test_kqueue test_lzma test_msilib test_ossaudiodev test_pep277 test_readline test_smtpnet test_socketserver test_sqlite test_ssl test_startfile test_tcl test_timeout test_tk test_ttk_guionly test_ttk_textonly test_unicode_file test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 8 skips unexpected on sunos5: test_lzma test_readline test_smtpnet test_ssl test_tcl test_tk test_ttk_guionly test_ttk_textonly

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    329 tests OK. 7 tests failed: test_cmd_line test_exceptions test_ipaddress test_os test_raise test_socket test_traceback

    Thanks. A lot of these appear to be big-endian related, see bpo-15597.

    vstinner commented 12 years ago

    With all faulthandler references removed from regrtest.py no bus errors happen, but most tests fail anyway. As I said, I'm NOT blaming faulthandler, but suspect some strange platform bug that perhaps involves linuxthreads.

    Threads + signal is a very complex problem. It is not solved yet in OpenBSD for example. There were a lot of such issues on old versions of FreeBSD. Extract of the Wikipedia article of LinuxThreads:

    "LinuxThreads had a number of problems, mainly owing to the implementation, which used the clone system call to create a new process sharing the parent's address space. For example, threads had distinct process identifiers, causing problems for signal handling; (...)"

    If disabling faulthandler avoids new issues, you can add 'if sys.thread_info.version.startswith("linuxthreads"):" on the line:

    faulthandler.enable(all_threads=True)

    in regrtest.py.

    I added sys.thread_info to be able to skip some tests only failing on LinuxThreads...

    --

    but most tests fail anyway

    Ah? With which message? Can you get more information in gdb?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    If disabling faulthandler avoids new issues, you can add 'if [not] sys.thread_info.version.startswith("linuxthreads")'

    That suppresses some bus errors. However, they still occur without being raised (some print statements and a WIFSIGNALED test inserted in posix_waitpid):

    
    >>> import subprocess, os
    >>> p = subprocess.Popen(["/bin/true"])
    >>> os.waitpid(p.pid, os.WNOHANG)
    pid: 4461   options: 1
    signo: 10
    (4461, 10)
    >>>

    So a bus error occurs in waitpid(pid, &status, options).WAIT_TYPE is int, perhaps that's incorrect for the platform, but I can't get hold of the posix man pages for debian-etch-sparc.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 12 years ago

    I'd like to urge everybody to focus at one issue at a time. This issue is about Python crashing on a SparcLinux qemu image, so I think it should have priority "low" - there is absolutely no requirement that this needs to work.

    As for the test failures on Solaris - please report them as separate issues (one per failure, "normal" priority seems right).

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Closing since the remaining issue is almost certainly a platform bug.