python / cpython

The Python programming language
https://www.python.org
Other
62.68k stars 30.06k forks source link

struct.unpack('?', '\x02') returns (False,) on Mac OSX #66211

Open bb62eaf3-091e-4e80-bbb0-535dd1329f20 opened 10 years ago

bb62eaf3-091e-4e80-bbb0-535dd1329f20 commented 10 years ago
BPO 22012
Nosy @gpshead, @ronaldoussoren, @mdickinson, @ned-deily
Files
  • issue-22012.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', 'build'] title = "struct.unpack('?', '\\x02') returns (False,) on Mac OSX" updated_at = user = 'https://bugs.python.org/wayedt' ``` bugs.python.org fields: ```python activity = actor = 'gregory.p.smith' assignee = 'none' closed = False closed_date = None closer = None components = ['Build', 'Library (Lib)'] creation = creator = 'wayedt' dependencies = [] files = ['36010'] hgrepos = [] issue_num = 22012 keywords = ['patch', 'needs review'] message_count = 13.0 messages = ['223470', '223472', '223480', '223481', '223487', '223501', '223502', '223503', '223511', '223565', '223566', '224030', '224710'] nosy_count = 5.0 nosy_names = ['gregory.p.smith', 'ronaldoussoren', 'mark.dickinson', 'ned.deily', 'wayedt'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue22012' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6'] ```

    bb62eaf3-091e-4e80-bbb0-535dd1329f20 commented 10 years ago

    On Mac OSX, struct.unpack incorrectly handles bools.

    Python 3.4.1 (default, May 19 2014, 13:10:29)
    [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import struct
    >>> struct.unpack('????', b'\x00\x01\x02\x03')
    (False, True, False, True)
    ned-deily commented 10 years ago

    Doing a quick look, the results vary. Using current python.org 2.7.8 and 3.4.1 installers, the results are correct. These interpreters are built with Apple gcc-4.2 (non-LLVM) from Xcode 3.2.6. Other 2.7 and 3.4.x builds I have available at the moment, including the Apple-supplied Python 2.7, were built with clang LLVM and they all give the incorrect result.

    $ /usr/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)" ((False, True, False, True), '2.7.5 (default, Mar 9 2014, 22:15:05) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]')

    $ arch -i386 /usr/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    ((False, True, False, True), '2.7.5 (default, Mar  9 2014, 22:15:05) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]')
    
    $ /usr/local/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    ((False, True, True, True), '2.7.8 (v2.7.8:ee879c0ffa11, Jun 29 2014, 21:07:35) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]')
    
    $ /usr/local/bin/python2.7-32 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    ((False, True, True, True), '2.7.8 (v2.7.8:ee879c0ffa11, Jun 29 2014, 21:07:35) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]')
    
    $ /usr/local/bin/python3.4 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    (False, True, True, True) 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
    
    $ /usr/local/bin/python3.4-32 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    (False, True, True, True) 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
    
    $ ./bin/python3.4 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
    (False, True, False, True) 3.4.1+ (3.4:d6b71971b228, Jul 16 2014, 16:30:56)
    [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
    mdickinson commented 10 years ago

    The relevant piece of code in the struct module looks like this:

    static PyObject *
    nu_bool(const char *p, const formatdef *f)
    {
        BOOL_TYPE x;
        memcpy((char *)&x, p, sizeof x);
        return PyBool_FromLong(x != 0);
    }

    Is it possible that BOOL_TYPE is a bitfield of length 1, and that clang is somehow making use of that fact?

    One thing I don't understand is that this shouldn't affect *standard* packing and unpacking (as opposed to native), but I still see the problem for a format of "\<????". However, it's fine for ">????". Some debugging shows that we're calling 'nu_bool' even for "\<????", which is a bit odd.

    mdickinson commented 10 years ago

    Some debugging shows that we're calling 'nu_bool' even for "\<????", which is a bit odd.

    Ah, I see. There's an optimisation that uses the native table functions instead of the big/little-endian ones if the size and byte order match.

    ned-deily commented 10 years ago

    FTR, the problem also shows up with the current clang-3.4 (3.4.2) and clang3.5 (svn213451-1) packages in Debian testing (on i386), building --with-pydebug and without, so the issue is not confined to OS X.

    ronaldoussoren commented 10 years ago

    On 19 jul. 2014, at 23:22, Mark Dickinson \report@bugs.python.org\ wrote:

    Mark Dickinson added the comment:

    The relevant piece of code in the struct module looks like this:

    static PyObject * nu_bool(const char *p, const formatdef *f) { BOOL_TYPE x; memcpy((char *)&x, p, sizeof x); return PyBool_FromLong(x != 0); }

    Is it possible that BOOL_TYPE is a bitfield of length 1, and that clang is somehow making use of that fact?

    I haven't found a definitive source yet, but it seems that the only valid values for _Bool, which BOOL_TYPE expands to, are 0 and 1. Clang might make use of that restriction.

    Ronald

    mdickinson commented 10 years ago

    For native struct packing / unpacking, it's not even clear to me that this should be considered a bug (though for standard unpacking it definitely is): the primary use-case for native unpacking is unpacking a collection of bytes that was written (from Python, or C, or ...) on the same machine, and in that case we're only going to be unpacking 0s and 1s.

    FWIW, the docs say: "Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.", so for native packing either the docs or the code need to be fixed.

    The simplest solution would be to replace nu_bool with something identical to bu_bool. In theory that would break on any platform where "sizeof(_Bool)" is greater than 1. In practice, I doubt that Python's going to meet such platforms in a hurry.

    mdickinson commented 10 years ago

    In practice, I doubt that Python's going to meet such platforms in a hurry.

    Hrm; looks like that's not a particularly safe assumption, especially with older compilers that might do a "typedef int _Bool".

    From http://msdn.microsoft.com/en-us/library/tf4dy80a.aspx:

    "That means that for Visual C++ 4.2, a call of sizeof(bool) yields 4, while in Visual C++ 5.0 and later, the same call yields 1."

    ronaldoussoren commented 10 years ago

    On 20 jul. 2014, at 10:32, Mark Dickinson \report@bugs.python.org\ wrote:

    Mark Dickinson added the comment:

    > In practice, I doubt that Python's going to meet such platforms in a hurry.

    Hrm; looks like that's not a particularly safe assumption, especially with older compilers that might do a "typedef int _Bool"

    I'm not sure about the actual definition, but at least for OSX on PPC sizeof(bool) is not 1. I ran into this with pyobjc when I tried to treat BOOL and bool the same.

    All of this is with Objective-C, but the same should be true for plain C.

    ronaldoussoren commented 10 years ago

    I just confirmed that clang only uses the LSB for _Bool values by looking at the assembly generated for the following code:

    <quote>
    #include <stdbool.h>
    #include <stdio.h>
    
    int main(void)
    {
        _Bool x;
        *(unsigned char*)&x = 42;
        printf("%d\n", (int)x);
        return 0;
    }
    </quote>

    The attached patch fixes the issue for me. The new testcase fails without the changes to _struct.c and passes with those changes.

    ronaldoussoren commented 10 years ago

    The last draf of ISO C '11: \http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf\.

    This says that _Bool is large enough to store 0 and 1, and that conversion of any integer data to _Bool results in 0 or 1. If I interpret the document correctly there is no conforming way to create a value of _Bool that has a value other than 0 or 1, and that should mean clang's behavior is not a bug.

    BTW. I haven't tested my patch on a PPC system yet (where sizeof(bool) != 1), and won't be able to do so until I'm back home (I'm currently at EuroPython)

    ronaldoussoren commented 10 years ago

    Does anyone have feedback for my proposed patch (other the bug in test code when sizeof(bool) != 1, the test values for big and little endian are in the wrong order)?

    ronaldoussoren commented 10 years ago

    BTW. There is also an argument to be made against my patch and for a documentation update: it is unclear to me if clang ever creates _Bool false values where the bit pattern isn't exactly the same as the zero value of an integer of the same size (for example due to generated code only updating the least significant bit and starting with a uninitialised value that has some of the other bits set). If it does so depends on what's more efficient to do in machine code, and my knowledge of assembly is too rusty to have anything useful to say about that (although I'd suspect that overwriting the complete _Bool value would be more efficient than loading the current value, updating the LSB and storing it again).

    If it can do so the current behavior of struct.unpack would be more correct than the version you get after applying my patch.