python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

PyObject_GetBuffer(PyBUF_FORMAT) breaks on a memoryview #123609

Open philthompson10 opened 2 months ago

philthompson10 commented 2 months ago

Bug report

Bug description:

Calling PyObject_GetBuffer() with the PyBUF_FORMAT flag on a memoryview object that wraps a bytes object fails with the exception:

BufferError: memoryview: cannot cast to unsigned bytes if the format flag is present

Calling it directly on the same bytes object does not fail.

The test is in memory_getbuf() in memoryobject.c. The test is failing because format is not NULL. format is actually "B" which is the equivalent of NULL. I think the test should be changed to allow a format value of "B".

An alternative might be to change PyBuffer_FillInfo() to always set format to NULL.

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux, macOS

Linked PRs

ZeroIntensity commented 1 month ago

The docs state that PyBUF_SIMPLE | PyBUF_FORMAT (equivalent to PyBUF_FORMAT) is disallowed, because if you request a standalone PyBUF_FORMAT, you already know that the format is B ahead of time. Is there a specific problem this is causing somewhere that can't be refactored?

However, it looks like we do support standalone PyBUF_FORMAT on bytearray and bytes. For example, the following does not raise a BufferError:

import _testbuffer

_testbuffer.ndarray(b"no good!", getbuf=_testbuffer.PyBUF_FORMAT)

Maybe it's worth trying to explicitly disallow those?

(@encukou, this needs topic-C-API.)

philthompson10 commented 1 month ago

Why would I know the format ahead of time? What I am trying to do is to introspect the detail of the buffer protocol implemented by an object (I've already called PyObject_CheckBuffer()).

ZeroIntensity commented 1 month ago

Why would I know the format ahead of time?

If the requested buffer was PyBUF_SIMPLE | PyBUF_FORMAT, then by definition, the format is B. FWIW, I'm not totally sure why the check was explicitly added only for memoryview, that's what I want to fix. There are two ideas here:

philthompson10 commented 1 month ago

Two points...

  1. It seems I am misunderstanding what the purpose of PyObject_GetBuffer() is. How do I determine what format an object that implements the buffer protocol supports?

  2. You state that by definition, the format is B, but that's not true. By definition, the format is B or NULL. The test should check for both values.

ZeroIntensity commented 1 month ago
  1. How do I determine what format an object that implements the buffer protocol supports?

You do that with PyBUF_FORMAT, but it's redundant for PyBUF_SIMPLE, because that will always be a B format anyway.

2. By definition, the format is B or NULL

Oh, I think you misinterpreted what I meant by that. I meant, by requesting PyBUF_SIMPLE | PyBUF_FORMAT, you will always get a format of B. If you request PyBUF_SIMPLE, then you're right, then the format field will be NULL -- but that implies B anyway. Why explicitly set it at runtime?

I'm guessing you want to do something like the following:

static PyObject *
print_buffer_format(PyObject *Py_UNUSED(self), PyObject *obj)
{
    PyObject *obj;
    // technically, you should be using y* to get the buffer, but I'm just making an example
    if (!PyArg_ParseTuple("O", &obj))
        return NULL;

    Py_buffer buf;
    if (PyObject_GetBuffer(obj, &buf, PyBUF_FORMAT) < 0)
        return NULL;

    printf("Format string: %s\n", buf.format);
    PyBuffer_Release(&buf);

    Py_RETURN_NONE;
}

Again, using format in the above is redundant. You could refactor this to:

if (PyObject_GetBuffer(obj, &buf, PyBUF_SIMPLE) < 0)
    return NULL;

puts("Format string: B");
PyBuffer_Release(&buf);
philthompson10 commented 1 month ago

I've now read the PEP and have a much better understanding of what PyObject_GetBuffer() is for. It makes it very clear that I should be using PyBUF_SIMPLE for my use case.

It seems that there is no mechanism for making general queries about the buffer capabilities of an object. The nearest you can get is to make repeated calls to PyObject_GetBuffer() with different flags to see which succeed or fail. That's not a problem for me at the moment.

I guess my problem was that I misunderstood the docs. This statement from the PEP really helped : The exporter should always fill in all elements of the buffer structure, as did the explanation of PyBUF_SIMPLE. It would also help if the docs stated that PyBUF_FORMAT (and PyBUF_WRITABLE?) should not be used in isolation.

ZeroIntensity commented 1 month ago

It would also help if the docs stated that PyBUF_FORMAT (and PyBUF_WRITABLE?) should not be used in isolation.

It does, just not directly. They say not to use PyBUF_SIMPLE | PyBUF_FORMAT, but since PyBUF_SIMPLE is defined as zero, that's equivalent to a standalone PyBUF_FORMAT.

I think, for consistency, we should also disallow this in PyBuffer_FillInfo, not just memoryview.

ZeroIntensity commented 1 month ago

@encukou, I think this should get closed. Disallowing PyBUF_SIMPLE | PyBUF_FORMAT will probably break code somewhere, and I don't think it's worth the hassle to try to remove the check from memoryview.

encukou commented 1 month ago

I think a little note like this could make things clearer:

Since PyBUF_SIMPLE is defined as 0, PyBUF_FORMAT cannot be used as a stand-alone flag.

ZeroIntensity commented 1 month ago

Sure, I'll submit a PR. Would it be worth mentioning that bytes and bytearray support it anyway?

ZeroIntensity commented 1 month ago

I've created #123778 to fix the docs. @philthompson10, does this look clearer to you?

encukou commented 1 month ago

Would it be worth mentioning that bytes and bytearray support it anyway?

No. CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it. (Even though we don't want to add the check and break people who didn't study the docs carefully.)

Please keep the message short, that way more people will read it :)

ZeroIntensity commented 1 month ago

CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it.

Yeah, I mentioned this in the PR by using bytes and memoryview as an example. Is that OK?

encukou commented 1 month ago

User-facing docs shouldn't describe the implementation (especially if it's slightly buggy), but how users should use it. It's OK to avoid mentioning that a check is missing.

philthompson10 commented 1 month ago

@ZeroIntensity sorry, no, the docs change doesn't really make things clearer and (strictly speaking) isn't true. As the value of PyBUF_SIMPLE is 0 then it can be used with PyBUF_FORMAT as it makes no difference. I agree with @encukou that the docs should describe the API from the point of view of the user of the API rather than the developer of the API (ie. I don't care what the actual value of PyBUF_SIMPLE is). The PEP is more user focused.

encukou commented 1 month ago

PyBUF_FORMAT cannot be used alone. On some objects it'll work, but you shouldn't do it.