python / cpython

The Python programming language
https://www.python.org
Other
63.87k stars 30.57k forks source link

Crash with mmap and sparse files on Mac OS X #55486

Closed pitrou closed 13 years ago

pitrou commented 13 years ago
BPO 11277
Nosy @ronaldoussoren, @pitrou, @vstinner, @ned-deily, @skrah
Files
  • 11277.5.diff
  • 11277-test_mmap.1.py
  • 11277-test_mmap-27.1.py
  • 11277.apple-fix-3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-crash'] title = 'Crash with mmap and sparse files on Mac OS X' updated_at = user = 'https://github.com/pitrou' ``` bugs.python.org fields: ```python activity = actor = 'sdaoden' assignee = 'none' closed = True closed_date = closer = 'nadeem.vawda' components = [] creation = creator = 'pitrou' dependencies = [] files = ['21798', '21909', '21910', '22593'] hgrepos = [] issue_num = 11277 keywords = ['patch'] message_count = 108.0 messages = ['129002', '129003', '129004', '129006', '129011', '129023', '129029', '129034', '129050', '129052', '129053', '129054', '129056', '129057', '129058', '129061', '129063', '129066', '129067', '129069', '129071', '129072', '129073', '129086', '129087', '129090', '129091', '129093', '129107', '129120', '129124', '129125', '129126', '129133', '129140', '129177', '129184', '129391', '129520', '129531', '132938', '132940', '132941', '132983', '132984', '132985', '133154', '133677', '133687', '133689', '133697', '133741', '133764', '133837', '133860', '133892', '133894', '133896', '134032', '134033', '134035', '134036', '134038', '134039', '134040', '134041', '134044', '134045', '134047', '134566', '134943', '134945', '134974', '134977', '135030', '135031', '135037', '135123', '135124', '135125', '135129', '135150', '135151', '135152', '135193', '135203', '135239', '135255', '135308', '135376', '135417', '135429', '135445', '135446', '135448', '135450', '135452', '135455', '137817', '137868', '137889', '137891', '137892', '137901', '137907', '137964', '137967', '139931'] nosy_count = 10.0 nosy_names = ['ixokai', 'ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'ned.deily', 'skrah', 'neologix', 'sdaoden', 'python-dev'] pr_nums = [] priority = 'high' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'crash' url = 'https://bugs.python.org/issue11277' versions = ['Python 3.2', 'Python 3.3'] ```

    pitrou commented 13 years ago

    Following r88460 (bpo-10276), test_zlib crashes on the Snow Leopard buildbot (apparently in the new "test_big_buffer" test case).

    vstinner commented 13 years ago

    Do adler32() and crc32() support length up to UINT32_MAX? Or should we maybe limit the length to INT32_MAX?

    pitrou commented 13 years ago

    I've tried INT_MAX and it didn't change anything.

    ned-deily commented 13 years ago

    Current OS X zlib is 1.2.3. Test crashes with most recently released zlib, 1.2.5, as well.

    ned-deily commented 13 years ago

    Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 10 at address: 0x000000010170e000 0x00000001016eeaa0 in crc32 ()

    (gdb) backtrace

    0 0x00000001016eeaa0 in crc32 ()

    1 0x00000001016e806d in PyZlib_crc32 (self=0x1016aa588, args=0x1016bf220) at /private/tmp/a/py3k/Modules/zlibmodule.c:993

    PyZlib_crc32(PyObject *self, PyObject *args)
    ...
            while (len > (size_t) UINT_MAX) {
                crc32val = crc32(crc32val, buf, UINT_MAX);
    ...
    brettcannon commented 13 years ago

    So on my system, that 'while' loop is executed once (put a printf() after the bug and len adjustments and it was never hit).

    ned-deily commented 13 years ago
    >>> from test.support import _4G
    >>> _4G
    4294967296
    >>> mapping.size()
    4294967300
    pbuf.len = 4294967300, len = 4294967300
    UINT_MAX = 4294967295
    brettcannon commented 13 years ago

    Does it matter that _4G \< UINT_MAX?

    pitrou commented 13 years ago

    Does it matter that _4G \< UINT_MAX?

    You mean _4G > UINT_MAX, right? Yes, it matters, otherwise that defeats the point of the test :)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    'Have no glue, but Ned Daily's patch (msg129011) seems to be required for adler, too. (You know...)

    pitrou commented 13 years ago

    Well, it's not a patch, just a traceback :)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    Wait a few minutes, i'll write this simple patch for adler and crc. But excessive testing and such is beyond my current capabilities.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    File: bpo-11277.patch. Hmm. Two non-register constants and equal code on 32 and 64 bit. Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not preprocessor-clean, i would guess.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    Sorry - that was a mess.

    pitrou commented 13 years ago

    File: bpo-11277.patch. Hmm. Two non-register constants and equal code on 32 and 64 bit. Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not preprocessor-clean, i would guess.

    Er, how is this patch different from r88460?

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    I guess not at all. Well.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    test_zlib.py (with my patch but that's somewhat identical in the end, say) does

    .............................s....... ---------------------------------------------------------------------- Ran 37 tests in 1.809s

    OK (skipped=1)

    This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0. Is there a switch i must trigger? Just pulled 24 changesets, recompiling and trying again with r88460.

    pitrou commented 13 years ago

    This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0. Is there a switch i must trigger? Just pulled 24 changesets, recompiling and trying again with r88460.

    Have you tried "./python -m test -v -uall test_zlib" ?

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    No, i've got no idea of this framework... Just did 'python3 test_zlib.py' directly. Thanks for the switch. But i can't test your thing due to bpo-11285, so this may take a while (others have more knowledge anyway)..

    (P.S.: your constant-folding stack patch is a great thing, just wanted to say this once..)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    So here is this (with my patch, but this is for real: bpo-11277.2.patch):

    == CPython 3.3a0 (py3k, Feb 22 2011, 14:00:52) [GCC 4.2.1 (Apple Inc. build 5664)] == Darwin-10.6.0-i386-64bit little-endian == /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_89365 Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0) [1/1] test_zlib test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok test_adler32start (test.test_zlib.ChecksumTestCase) ... ok test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok test_crc32start (test.test_zlib.ChecksumTestCase) ... ok test_penguins (test.test_zlib.ChecksumTestCase) ... ok test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok test_badargs (test.test_zlib.ExceptionTestCase) ... ok test_badcompressobj (test.test_zlib.ExceptionTestCase) ... ok test_baddecompressobj (test.test_zlib.ExceptionTestCase) ... ok test_badlevel (test.test_zlib.ExceptionTestCase) ... ok test_decompressobj_badflush (test.test_zlib.ExceptionTestCase) ... ok test_big_compress_buffer (test.test_zlib.CompressTestCase) ... ok test_big_decompress_buffer (test.test_zlib.CompressTestCase) ... ok test_incomplete_stream (test.test_zlib.CompressTestCase) ... ok test_length_overflow (test.test_zlib.CompressTestCase) ... skipped 'not enough free memory, need at least 4 GB' test_speech (test.test_zlib.CompressTestCase) ... ok test_speech128 (test.test_zlib.CompressTestCase) ... ok test_badcompresscopy (test.test_zlib.CompressObjectTestCase) ... ok test_baddecompresscopy (test.test_zlib.CompressObjectTestCase) ... ok test_big_compress_buffer (test.test_zlib.CompressObjectTestCase) ... ok test_big_decompress_buffer (test.test_zlib.CompressObjectTestCase) ... ok test_compresscopy (test.test_zlib.CompressObjectTestCase) ... ok test_compressincremental (test.test_zlib.CompressObjectTestCase) ... ok test_compressoptions (test.test_zlib.CompressObjectTestCase) ... ok test_decompimax (test.test_zlib.CompressObjectTestCase) ... ok test_decompinc (test.test_zlib.CompressObjectTestCase) ... ok test_decompincflush (test.test_zlib.CompressObjectTestCase) ... ok test_decompress_incomplete_stream (test.test_zlib.CompressObjectTestCase) ... ok test_decompresscopy (test.test_zlib.CompressObjectTestCase) ... ok test_decompressmaxlen (test.test_zlib.CompressObjectTestCase) ... ok test_decompressmaxlenflush (test.test_zlib.CompressObjectTestCase) ... ok test_empty_flush (test.test_zlib.CompressObjectTestCase) ... ok test_flushes (test.test_zlib.CompressObjectTestCase) ... ok test_maxlenmisc (test.test_zlib.CompressObjectTestCase) ... ok test_odd_flush (test.test_zlib.CompressObjectTestCase) ... ok test_pair (test.test_zlib.CompressObjectTestCase) ... ok

    ---------------------------------------------------------------------- Ran 37 tests in 1.789s

    OK (skipped=1) 1 test OK.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    (Is not that much help for a >4GB error, huh?)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    Just stepping ... with c8d1f99f25eb/r88476:

    == CPython 3.3a0 (py3k, Feb 22 2011, 14:18:19) [GCC 4.2.1 (Apple Inc. build 5664)] == Darwin-10.6.0-i386-64bit little-endian == /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_5126 Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0) [1/1] test_zlib test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok test_adler32start (test.test_zlib.ChecksumTestCase) ... ok test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok test_crc32start (test.test_zlib.ChecksumTestCase) ... ok test_penguins (test.test_zlib.ChecksumTestCase) ... ok test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ... ^C ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C Bus error

    pitrou commented 13 years ago

    Just stepping ... with c8d1f99f25eb/r88476:

    Right, that's what we should investigate :) Could try to diagnose the crash?

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing. The problem is not crc32(), but the buffer itself:

       if (pbuf.len > 1024*5) {
            unsigned char *buf = pbuf.buf;
            Py_ssize_t len = pbuf.len;
            Py_ssize_t i;
    fprintf(stderr, "CRC 32 2.1\n");
    for(i=0; (size_t)i < (size_t)len;++i)
        *buf++ = 1;
    fprintf(stderr, "CRC 32 2.2\n");

    2.2 is never reached (in fact accessing buf[1] already causes fault). Thus the problem is not zlib, but PyArg_ParseTuple(). But just don't ask me more on that!

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    (P.S.: of course talking about ChecksumBigBufferTestCase and the 4GB, say.)

    pitrou commented 13 years ago

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing. The problem is not crc32(), but the buffer itself:

    if (pbuf.len > 1024*5) { unsigned char *buf = pbuf.buf; Py_ssize_t len = pbuf.len; Py_ssize_t i; fprintf(stderr, "CRC 32 2.1\n"); for(i=0; (size_t)i \< (size_t)len;++i) *buf++ = 1; fprintf(stderr, "CRC 32 2.2\n");

    Thank you! So it's perhaps a bug in mmap on Snow Leopard. Could you try to debug a bit more precisely and see at which buffer offset (from the start) the fault occurs?

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    Snippet

        if (pbuf.len > 1024*5) {
            volatile unsigned char *buf = pbuf.buf;
            Py_ssize_t len = pbuf.len;
    Py_ssize_t i = 0;
    volatile unsigned char au[100];
    volatile unsigned char*x = au;
            fprintf(stderr, "CRC ENTER, buffer=%p\n", buf);
    for (i=0; (size_t)i < (size_t)len; ++i) {
        fprintf(stderr, "%ld, buf=%p\n", (signed long)i, buf);
        *x = *buf++;
    }

    results in

    test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ... CRC ENTER, buffer=0x1014ab000 0, buf=0x1014ab000

    pitrou commented 13 years ago

    Out of curiosity, could you try the following patch?

    Index: Lib/test/test_zlib.py \===================================================================

    --- Lib/test/test_zlib.py   (révision 88500)
    +++ Lib/test/test_zlib.py   (copie de travail)
    @@ -70,7 +70,7 @@
             with open(support.TESTFN, "wb+") as f:
                 f.seek(_4G)
                 f.write(b"asdf")
    -            f.flush()
    +        with open(support.TESTFN, "rb") as f:
                 self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    
         def tearDown(self):
    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing. The problem is not crc32(), but the buffer itself:

    if (pbuf.len > 1024*5) { unsigned char *buf = pbuf.buf; Py_ssize_t len = pbuf.len; Py_ssize_t i; fprintf(stderr, "CRC 32 2.1\n"); for(i=0; (size_t)i \< (size_t)len;++i) *buf++ = 1; fprintf(stderr, "CRC 32 2.2\n");

    Unless I'm mistaken, in the test the file is mapped with PROT_READ, so it's normal to get SIGSEGV when writting to it:

       def setUp(self): 
                with open(support.TESTFN, "wb+") as f: 
                    f.seek(_4G) 
                    f.write(b"asdf") 
                    f.flush() 
                    self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) 

    for(i=0; (size_t)i \< (size_t)len;++i) *buf++ = 1;

    But it seems you're also getting segfaults when only reading it, right ?

    I've got a stupid question: how much memory do you have ? Cause there seems to be some issues with page cache when reading mmaped files on OS-X: http://lists.apple.com/archives/darwin-development/2003/Jun/msg00141.html

    On Linux, the page cache won't fill forever, so you don't need to have enough free memory to accomodate the whole file (the page cache should grow, but not forever). But on OS-X, it seems that the page replacement algorithm seems to retain mmaped pages in the page cache much longer, which could potentially trigger an OOM later (because of overcommitting, mmap can very well return a valid address range which leads to a segfault when accessed later). I'm not sure why it would segfault on the first page, though.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    I have a MacBook with 2 GB RAM. Of course i'm a little bit messy, so an entry is half written before it comes to an ... end. msg129091 is real life, though.

    Antoine, your msg129093 patch of test_zlib.py does it (with and without fprintf(3)s). CRC ok etc., it just works. (Seems mmap(2) has a problem here, i would say; the mentioned bug report is from 2003, so the golden sunset watchers may need some more additional time, if you allow me that comment.)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    (That is to say: i think it's better not to assume that these boys plan to *ever* fix it. (Though mmap(2) is not CoreAudio/AudioUnit.))

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    (neologix: SIGBUS is not the same as SIGSEGV. You know. Thanks for this nice bug report. Eight years is a .. time in computer programming - unbelievable, thinking of all these nervous wrecks who ever reported a bug to Apple! Man!!!)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    neologix: even with 2 GB RAM top(1) shows more than 600 MB free memory with the 4 GB test up and running ... in an Mac OS X environment ... Lucky me, i don't believe them a single word...

    pitrou commented 13 years ago

    Antoine, your msg129093 patch of test_zlib.py does it (with and without fprintf(3)s). CRC ok etc., it just works.

    Indeed, and it also seems to work on the buildbot. I will commit the patch soon. Thanks for your help!

    (Seems mmap(2) has a problem here, i would say; the mentioned bug report is from 2003, so the golden sunset watchers may need some more additional time, if you allow me that comment.)

    pitrou commented 13 years ago

    Committed in r88511 (3.3) and r88514 (3.2).

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    I append a doc_lib_mmap.patch which may be helpful for those poor creatures who plan to write Python scripts for Mac OS X. (It may be a useful add-on anyway.)

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    Sorry, i've got that kid running around which sometimes doesn't know what it is doing. But this documentation patch may really be a help. It's my first doc-patch, so it surely needs to be revised, if interest exists in such a patch for mmap at all, say. Thanks for your understanding.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

    Could you try with this:

     def setUp(self):           with open(support.TESTFN, "wb+") as f:               f.seek(_4G)               f.write(b"asdf")               f.flush() +            os.fsync(f.fileno())               self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

    HFS+ doesn't seem to support sparse files, so the file is actually zero-filled asynchronously. Maybe the mapping gets done before the blocks have been allocated, which triggers a segfault when the first page is accessed. I'm not sure it'll make any difference, but I'm curious...

    Also, I'd be curious to see the result of

    """ import os

    name = '/tmp/foo' f = open(name, 'wb') f.seek(1 \<\< 32) f.write(b'asdf') f.flush() print(os.fstat(f.fileno())) f.close() print(os.stat(name)) """

    Thanks !

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    11:12 \~/tmp $ python3 \~/usr/opt/py3k/lib/python3.3/test_zlib.py Bus error

    Your code snippet:

    11:21 \~/tmp $ /usr/bin/time -lp python3 test.py

    posix.stat_result(st_mode=33184, st_ino=10066605, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298715813, st_mtime=1298715813, st_ctime=1298715813)
    posix.stat_result(st_mode=33184, st_ino=10066605, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298715813, st_mtime=1298715813, st_ctime=1298715813)
    real        71.66
    user         0.06
    sys          3.71
              0  maximum resident set size
              0  average shared memory size
              0  average unshared data size
              0  average unshared stack size
              0  page reclaims
              0  page faults
              0  swaps
              0  block input operations
             57  block output operations
              0  messages sent
              0  messages received
              0  signals received
           2112  voluntary context switches
              0  involuntary context switches

    On Fri, Feb 25, 2011 at 05:05:19PM +0000, Charles-Francois Natali wrote:

    Charles-Francois Natali \neologix@free.fr\ added the comment:

    Could you try with this:

     def setUp(self):           with open(support.TESTFN, "wb+") as f:               f.seek(_4G)               f.write(b"asdf")               f.flush() +            os.fsync(f.fileno())               self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

    HFS+ doesn't seem to support sparse files, so the file is actually zero-filled asynchronously. Maybe the mapping gets done before the blocks have been allocated, which triggers a segfault when the first page is accessed. I'm not sure it'll make any difference, but I'm curious...

    Also, I'd be curious to see the result of

    """ import os

    name = '/tmp/foo' f = open(name, 'wb') f.seek(1 \<\< 32) f.write(b'asdf') f.flush() print(os.fstat(f.fileno())) f.close() print(os.stat(name)) """

    Thanks !

    ----------


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue11277\


    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    I'll give you the same result again but with additional clock(), just for a heart's pleasure:

    clock(): 0.100958 , fstat(): posix.stat_result(st_mode=33184, st_ino=10075508, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298719201, st_mtime=1298719305, st_ctime=1298719305)

    f.close() print('clock():', time.clock(), ', stat():', os.stat(name)) clock(): 3.75792 , stat(): posix.stat_result(st_mode=33184, st_ino=10075508, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298719201, st_mtime=1298719305, st_ctime=1298719305)

    Please don't assume i go for Mac OS X ... In the end you *always* need to implement an expensive state machine to get around long-known bugs, mis-implementations or other poops there.

    vstinner commented 13 years ago

    This issue is not dead: test_zlib failed twice on "AMD64 Snow Leopard 3.x" buildbot: build 30 (024967cdc2f0e850f0b338e7593a12d965017a6a, Mar 31 01:40:00 2011) and 44 (ebc03d7e711052c0b196aacdbec6778c0a6d5c0c, Apr 4 10:11:20 2011).

    Build 44 has a traceback thanks to faulthandler: -------------------- ... [ 79/354] test_time [ 80/354] test_zlib Fatal Python error: Bus error

    Traceback (most recent call first):
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_zlib.py", line 85 in test_big_buffer
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 387 in _executeTestPart
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 442 in run
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 494 in __call__
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1078 in run
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1166 in _run_suite
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1192 in run_unittest
      File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_zlib.py", line 611 in test_main
      File "./Lib/test/regrtest.py", line 1032 in runtest_inner
      File "./Lib/test/regrtest.py", line 826 in runtest
      File "./Lib/test/regrtest.py", line 650 in main
      File "./Lib/test/regrtest.py", line 1607 in <module>
    make: *** [buildbottest] Bus error
    program finished with exit code 2
    elapsedTime=1400.363321

    http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%203.x/builds/44/steps/test/logs/stdio

    test_zlib.py:85 is the crc32(+4 GB) test: ----------------------

    # Issue python/cpython#54485 - check that inputs >=4GB are handled correctly.
    class ChecksumBigBufferTestCase(unittest.TestCase):
    
        def setUp(self):
            with open(support.TESTFN, "wb+") as f:
                f.seek(_4G)
                f.write(b"asdf")
            with open(support.TESTFN, "rb") as f:
                self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    
        def tearDown(self):
            self.mapping.close()
            support.unlink(support.TESTFN)
    @unittest.skipUnless(mmap, "mmap() is not available.")
    @unittest.skipUnless(sys.maxsize > _4G, "Can't run on a 32-bit system.")
    @unittest.skipUnless(support.is_resource_enabled("largefile"),
                         "May use lots of disk space.")
    def test_big_buffer(self):
        self.assertEqual(zlib.crc32(self.mapping), 3058686908) <~~~ HERE
        self.assertEqual(zlib.adler32(self.mapping), 82837919)

    ----------------------

    vstinner commented 13 years ago

    Issue bpo-11760 has been marked as a duplicate of this issue.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

    Is the SIGBUS generated on the first page access ? How much memory does this buildbot have ?

    pitrou commented 13 years ago

    The new FreeBSD buildbot had a sporadic SIGKILL in http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%203.x/builds/1/steps/test/logs/stdio

    (apparently, faulthandler didn't dump a traceback)

    By the way, we can be fairly certain now that the problem is on the OS side rather than on our (Python) side, so I'm lowering the priority.

    pitrou commented 13 years ago

    By the way, at this point I think we could simply skip the test on BSDs and OS X. The tested functionality is cross-platform, so testing under a limited set of systems should be ok.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 13 years ago

    For the new FreeBSD bot, the issue was simply insufficient swap space. With 1GB of memory and 2GB of swap test_zlib runs fine.

    5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

    I can't confirm this for my MacBook:

    20:39 ~ $ time python3 -E -Wd -m test -r -w -uall test_zlib Using random seed 1960084 [1/1] test_zlib 1 test OK. [91618 refs]

    real 4m1.051s user 0m15.031s sys 0m26.908s

    ...

    20:40 ~ $ ll tmp/test_python_6778/ 4194308 -rw-r----- 1 steffen staff 4294967300 6 Apr 20:40 @test_6778_tmp

    ...

    Processes: 63 total, 2 running, 3 stuck, 58 sleeping, 246 threads 20:40:30 Load Avg: 0.59, 0.65, 0.56 CPU usage: 6.79% user, 13.10% sys, 80.9% idle SharedLibs: 8260K resident, 9972K data, 0B linkedit. MemRegions: 6043 total, 218M resident, 13M private, 185M shared. PhysMem: 446M wired, 328M active, 138M inactive, 912M used, 1135M free. VM: 143G vsize, 1042M framework vsize, 29610(0) pageins, 0(0) pageouts. Networks: packets: 807/440K in, 933/129K out. Disks: 13881/581M read, 26057/16G written.

    PID COMMAND %CPU TIME #TH #WQ #PORT #MRE RPRVT RSHRD RSIZE VPRVT VSIZE 6778 python3 4.5 00:00.94 2 0 37 139 13M 320K 15M 38M 2403M

    ...

    Processes: 63 total, 3 running, 60 sleeping, 253 threads 20:41:30 Load Avg: 0.54, 0.62, 0.55 CPU usage: 12.98% user, 14.90% sys, 72.11% idle SharedLibs: 8260K resident, 9972K data, 0B linkedit. MemRegions: 6062 total, 269M resident, 13M private, 274M shared. PhysMem: 443M wired, 329M active, 184M inactive, 955M used, 1091M free. VM: 147G vsize, 1042M framework vsize, 41530(11520) pageins, 0(0) pageouts. Networks: packets: 807/440K in, 933/129K out. Disks: 13950/627M read, 29598/19G written.

    PID COMMAND %CPU TIME #TH #WQ #PORT #MRE RPRVT RSHRD RSIZE VPRVT VSIZE 6778 python3 11.6 00:03.74 2 0 37 140 60M+ 320K 62M+ 4134M 6499M

    ...

    20:43 ~ $ ll tmp/test_python_6778/ 4194308 -rw-r----- 1 steffen staff 4294967300 6 Apr 20:40 @test_6778_tmp

    As i've stated for bpo-11779, maybe these random errors of the bot are caused by some strange hardware based error?

    252699e1-f617-4a8b-9fb0-ae90945f6292 commented 13 years ago

    By the way, at this point I think we could simply skip the test on BSDs and OS X. The tested functionality is cross-platform, so testing under a limited set of systems should be ok.

    Another solution would be to rewrite the test to not use mmap() at all:

    @precisionbigmemtest(size=_4G + 4, memuse=1)
    def test_big_buffer(self, size):
        if size < _4G + 4:
            self.skipTest("not enough free memory, need at least 4 GB")
        data = bytearray(_4G + 4)
        data[-4:] = b"asdf"
        self.assertEqual(zlib.crc32(data), 3058686908)
        self.assertEqual(zlib.adler32(data), 82837919)

    This is more consistent with the other bigmem tests in test_zlib, but I'm guessing it will mean that the test gets run much less often (since a lot of machines won't have enough memory). If that's OK, then I'd prefer doing it this way (since it keeps things simpler). Otherwise, skipping the test on OS X sounds fine to me.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 13 years ago

    Just to give another data point: A couple of days ago I reduced the memory on the AMD64 FreeBSD bot to (375MB RAM, 2GB swap) and the zlib tests still pass.

    pitrou commented 13 years ago

    Another solution would be to rewrite the test to not use mmap() at all:

    @precisionbigmemtest(size=_4G + 4, memuse=1)
    def test_big_buffer(self, size):
        if size \< \_4G + 4:
            self.skipTest("not enough free memory, need at least 4 GB")
        data = bytearray(_4G + 4)
        data[-4:] = b"asdf"
        self.assertEqual(zlib.crc32(data), 3058686908)
        self.assertEqual(zlib.adler32(data), 82837919)

    This is more consistent with the other bigmem tests in test_zlib, but I'm guessing it will mean that the test gets run much less often (since a lot of machines won't have enough memory). If that's OK, then I'd prefer doing it this way (since it keeps things simpler).

    I think there's basically noone and nothing (even among the buildbots) that runs bigmem tests on a regular basis, so I'd much rather keep the mmap() solution, even if that means it must be skipped on OS X.