Open philthompson10 opened 2 months ago
The docs state that PyBUF_SIMPLE | PyBUF_FORMAT
(equivalent to PyBUF_FORMAT
) is disallowed, because if you request a standalone PyBUF_FORMAT
, you already know that the format is B
ahead of time. Is there a specific problem this is causing somewhere that can't be refactored?
However, it looks like we do support standalone PyBUF_FORMAT
on bytearray
and bytes
. For example, the following does not raise a BufferError
:
import _testbuffer
_testbuffer.ndarray(b"no good!", getbuf=_testbuffer.PyBUF_FORMAT)
Maybe it's worth trying to explicitly disallow those?
(@encukou, this needs topic-C-API
.)
Why would I know the format ahead of time? What I am trying to do is to introspect the detail of the buffer protocol implemented by an object (I've already called PyObject_CheckBuffer()
).
Why would I know the format ahead of time?
If the requested buffer was PyBUF_SIMPLE | PyBUF_FORMAT
, then by definition, the format is B
. FWIW, I'm not totally sure why the check was explicitly added only for memoryview
, that's what I want to fix. There are two ideas here:
bytes
and bytearray
, or possibly PyBuffer_FillInfo
(I guess that would technically be a breaking change, but this wasn't allowed anyway).PyBUF_SIMPLE | PyBUF_FORMAT
. Though, I'm not convinced that's particularly useful. (If you could provide an example where explicitly knowing that the format is B
instead of NULL
, that would be great!)Two points...
It seems I am misunderstanding what the purpose of PyObject_GetBuffer()
is. How do I determine what format an object that implements the buffer protocol supports?
You state that by definition, the format is B
, but that's not true. By definition, the format is B
or NULL
. The test should check for both values.
- How do I determine what format an object that implements the buffer protocol supports?
You do that with PyBUF_FORMAT
, but it's redundant for PyBUF_SIMPLE
, because that will always be a B
format anyway.
2. By definition, the format is
B
orNULL
Oh, I think you misinterpreted what I meant by that. I meant, by requesting PyBUF_SIMPLE | PyBUF_FORMAT
, you will always get a format of B
. If you request PyBUF_SIMPLE
, then you're right, then the format
field will be NULL
-- but that implies B
anyway. Why explicitly set it at runtime?
I'm guessing you want to do something like the following:
static PyObject *
print_buffer_format(PyObject *Py_UNUSED(self), PyObject *obj)
{
PyObject *obj;
// technically, you should be using y* to get the buffer, but I'm just making an example
if (!PyArg_ParseTuple("O", &obj))
return NULL;
Py_buffer buf;
if (PyObject_GetBuffer(obj, &buf, PyBUF_FORMAT) < 0)
return NULL;
printf("Format string: %s\n", buf.format);
PyBuffer_Release(&buf);
Py_RETURN_NONE;
}
Again, using format
in the above is redundant. You could refactor this to:
if (PyObject_GetBuffer(obj, &buf, PyBUF_SIMPLE) < 0)
return NULL;
puts("Format string: B");
PyBuffer_Release(&buf);
I've now read the PEP and have a much better understanding of what PyObject_GetBuffer()
is for. It makes it very clear that I should be using PyBUF_SIMPLE
for my use case.
It seems that there is no mechanism for making general queries about the buffer capabilities of an object. The nearest you can get is to make repeated calls to PyObject_GetBuffer()
with different flags to see which succeed or fail. That's not a problem for me at the moment.
I guess my problem was that I misunderstood the docs. This statement from the PEP really helped : The exporter should always fill in all elements of the buffer structure
, as did the explanation of PyBUF_SIMPLE
. It would also help if the docs stated that PyBUF_FORMAT
(and PyBUF_WRITABLE
?) should not be used in isolation.
It would also help if the docs stated that PyBUF_FORMAT (and PyBUF_WRITABLE?) should not be used in isolation.
It does, just not directly. They say not to use PyBUF_SIMPLE | PyBUF_FORMAT
, but since PyBUF_SIMPLE
is defined as zero, that's equivalent to a standalone PyBUF_FORMAT
.
I think, for consistency, we should also disallow this in PyBuffer_FillInfo
, not just memoryview
.
@encukou, I think this should get closed. Disallowing PyBUF_SIMPLE | PyBUF_FORMAT
will probably break code somewhere, and I don't think it's worth the hassle to try to remove the check from memoryview
.
I think a little note like this could make things clearer:
Since
PyBUF_SIMPLE
is defined as 0,PyBUF_FORMAT
cannot be used as a stand-alone flag.
Sure, I'll submit a PR. Would it be worth mentioning that bytes
and bytearray
support it anyway?
I've created #123778 to fix the docs. @philthompson10, does this look clearer to you?
Would it be worth mentioning that
bytes
andbytearray
support it anyway?
No. CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it. (Even though we don't want to add the check and break people who didn't study the docs carefully.)
Please keep the message short, that way more people will read it :)
CPython doesn't check that you're not playing by the rules, but that doesn't mean you should do it.
Yeah, I mentioned this in the PR by using bytes
and memoryview
as an example. Is that OK?
User-facing docs shouldn't describe the implementation (especially if it's slightly buggy), but how users should use it. It's OK to avoid mentioning that a check is missing.
@ZeroIntensity sorry, no, the docs change doesn't really make things clearer and (strictly speaking) isn't true. As the value of PyBUF_SIMPLE
is 0
then it can be used with PyBUF_FORMAT
as it makes no difference. I agree with @encukou that the docs should describe the API from the point of view of the user of the API rather than the developer of the API (ie. I don't care what the actual value of PyBUF_SIMPLE
is). The PEP is more user focused.
PyBUF_FORMAT
cannot be used alone. On some objects it'll work, but you shouldn't do it.
Bug report
Bug description:
Calling
PyObject_GetBuffer()
with thePyBUF_FORMAT
flag on a memoryview object that wraps a bytes object fails with the exception:BufferError: memoryview: cannot cast to unsigned bytes if the format flag is present
Calling it directly on the same bytes object does not fail.
The test is in
memory_getbuf()
in memoryobject.c. The test is failing becauseformat
is notNULL
.format
is actually"B"
which is the equivalent ofNULL
. I think the test should be changed to allow aformat
value of"B"
.An alternative might be to change
PyBuffer_FillInfo()
to always setformat
toNULL
.CPython versions tested on:
3.12, 3.13
Operating systems tested on:
Linux, macOS
Linked PRs