python / cpython

The Python programming language
https://www.python.org
Other
62.84k stars 30.1k forks source link

ctypes: memoryview gives incorrect PEP3118 format strings for both packed and unpacked structs #76961

Closed f72d0bc0-f319-4dd9-9db4-d64615e4a426 closed 1 year ago

f72d0bc0-f319-4dd9-9db4-d64615e4a426 commented 6 years ago
BPO 32780
Nosy @terryjreedy, @vsajip, @amauryfa, @abalkin, @aerojockey, @skrah, @meadori, @mattip, @eric-wieser
PRs
  • python/cpython#5561
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = None created_at = labels = ['3.8', 'ctypes', 'type-bug', '3.7'] title = 'ctypes: memoryview gives incorrect PEP3118 format strings for both packed and unpacked structs' updated_at = user = 'https://github.com/eric-wieser' ``` bugs.python.org fields: ```python activity = actor = 'ned.deily' assignee = 'belopolsky' closed = False closed_date = None closer = None components = ['ctypes'] creation = creator = 'Eric.Wieser' dependencies = [] files = [] hgrepos = [] issue_num = 32780 keywords = ['patch'] message_count = 8.0 messages = ['311705', '321279', '321280', '321730', '322225', '340234', '340603', '358806'] nosy_count = 10.0 nosy_names = ['terry.reedy', 'teoliphant', 'vinay.sajip', 'amaury.forgeotdarc', 'belopolsky', 'aerojockey', 'skrah', 'meador.inge', 'mattip', 'Eric.Wieser'] pr_nums = ['5561'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue32780' versions = ['Python 2.7', 'Python 3.7', 'Python 3.8'] ```

    Linked PRs

    f72d0bc0-f319-4dd9-9db4-d64615e4a426 commented 6 years ago

    Discovered here

    Consider the following structure, and a memoryview created from it:

        class foo(ctypes.Structure):
            _fields_ = [('one', ctypes.c_uint8),
                        ('two', ctypes.c_uint32)]
        f = foo()
        mf = memoryview(f)

    We'd expect this to insert padding, and it does:

    >>> mf.itemsize
    8

    But that padding doesn't show up in the format string:

    >>> mf.format
    'T{<B:one:<I:two:}'

    That format string describes the packed version of the struct, with the two field starting at offset 1, based on the struct documentation on how < should be interpreted:

    No padding is added when using non-native size and alignment, e.g. with ‘\<’, ‘>’

    But ctypes doesn't even get it right for packed structs:

        class foop(ctypes.Structure):
            _fields_ = [('one', ctypes.c_uint8),
                        ('two', ctypes.c_uint32)]
            _pack_ = 1
        f = foo()
        mf = memoryview(f)

    The size is what we'd expect:

    >>> mf.itemsize
    5

    But the format is garbage:

    >>> mf.format
    'B'  # sizeof(byte) == 5!?
    f72d0bc0-f319-4dd9-9db4-d64615e4a426 commented 6 years ago

    Pinging, as recommended by https://devguide.python.org/pullrequest/#reviewing. PEP-3118 as a protocol is far less useful if the canonical implementation is non-compliant.

    abalkin commented 6 years ago

    Unfortunately, the PEP authors did very little in terms of implementing the PEP and neither CPython nor numpy has a fully compliant implementation.

    10cd4193-8cd6-4877-ab6c-15a5902cb593 commented 6 years ago

    I guess I'll weigh in since I was pinged.

    I agree with the approach in the patch. All the memoryview does is to use the format field verbatim from the underlying buffer, so if the format field is inaccurate then the only thing to do is to fix the object providing the buffer.

    Since the format field is only there for interpretation of the data, and is not used to calculate of itemsizes or strides anywhere as far as I know, it's a fairly low-risk change.

    However, the patch still leaves ctypes inaccurate for the case of unions. It should be fairly simple to modify the code to use a format of "B\<size>" for unions, so that it at least matches the itemsize, even if the type information is lost.

    (As an aside, let me point out that I did not actually write or advocate for the PEP; for some reason my name was added to it even though I all I did was to provide feedback.)

    f72d0bc0-f319-4dd9-9db4-d64615e4a426 commented 6 years ago

    It should be fairly simple to modify the code to use a format of "B\<size>" for unions, so that it at least matches the itemsize

    Seems reasonable, although:

    terryjreedy commented 5 years ago

    Fixing one case is better than fixing no cases.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 5 years ago

    Since Terry added me: Yes, this is clearly a bug, but it is a ctypes issue and not a memoryview issue.

    ctypes issues unfortunately tend to take some time until someone reviews.

    mattip commented 4 years ago

    Is there a ctypes or PEP-3118 core-dev who could review the issue and the PR solution?

    eric-wieser commented 2 years ago

    @vsajip, perhaps you're qualified to review this (or rather the PR at #5561) given your work in #16589?

    vsajip commented 2 years ago

    I don't think I'm especially qualified to review this, unfortunately. I did work on some ctypes issues in the past, but they've required altogether too much time than I have available at the moment :disappointed: Perhaps one of the other committers who works closer to the C code level would be more appropriate (most of my work is at the Python stdlib level, though I can program in C).

    eric-wieser commented 2 years ago

    Thanks for the response.

    Do you have any specific committers in mind?

    I guess I could ping Travis, the other PEP author; but what #5561 needs is a review from someone with commit bit.

    vsajip commented 2 years ago

    Sadly, ctypes-related expertise is thin on the ground, even among committers. I've had to revert a change that I made which I thought would improve things, but didn't. One problem we all have is that any fix has to work across many architectures, and most of us don't have access to those architectures to check fixes, diagnose problems etc. You could maybe use git blame to throw up some possibilities.

    eric-wieser commented 2 years ago

    Sadly, ctypes-related expertise is thin on the ground, even among committers.

    Yes, I've noticed that too. PEP3118 expertise seems even more sparse, with my impression being that numpy developers are more familiar with it that CPython ones (#5561 is both submitted and approved by a numpy maintainer). Is there anything I could do to this issue or PR to make it more accessible to reviewers who are familiar with the C code but not with ctypes?

    One problem we all have is that any fix has to work across many architectures

    My claim is that this particular fix shouldn't change any architecture-specific behavior; the logic for laying out the ctypes fields in memory hasn't changed; all that's changed is the PEP3118 reporting of the layout that was used. The existing reporting behavior is wrong for any struct with any padding on every architecture, so no regression regarding PEP3118 should be possible.

    vsajip commented 2 years ago

    Sure. I was only giving my perceived reason as to why core developers don't have familiarity with ctypes internals - it seems to be outside most people's comfort zone :disappointed:

    abalkin commented 2 years ago

    @eric-wieser - it looks like I dropped the ball on this quite some time ago. I will see it through this time. Feel free to bug me here if I don't. :)

    eric-wieser commented 1 year ago

    @abalkin :bug: