Limited API support for Py_buffer

tiran commented 2 years ago

BPO	45459
Nosy	@pitrou, @vstinner, @tiran, @benjaminp, @alex, @encukou, @skrah, @pmp-p, @serhiy-storchaka, @rdb, @miss-islington, @erlend-aasland
PRs	python/cpython#29035 python/cpython#29991 python/cpython#31201 python/cpython#31527 python/cpython#31528 python/cpython#31539 python/cpython#31668 python/cpython#31669

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['expert-C-API', 'type-feature', '3.11'] title = 'Limited API support for Py_buffer' updated_at = user = 'https://github.com/tiran' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['C API'] creation = creator = 'christian.heimes' dependencies = [] files = [] hgrepos = [] issue_num = 45459 keywords = ['patch'] message_count = 42.0 messages = ['403813', '403820', '403822', '403825', '403828', '404201', '404274', '404294', '404377', '404400', '404402', '404403', '404404', '404409', '404418', '405466', '405467', '406794', '408016', '408020', '408023', '408030', '408031', '408035', '408037', '408039', '408078', '408084', '408089', '412364', '412369', '412774', '412776', '413750', '413752', '413773', '413792', '413924', '413927', '414049', '414478', '414481'] nosy_count = 12.0 nosy_names = ['pitrou', 'vstinner', 'christian.heimes', 'benjamin.peterson', 'alex', 'petr.viktorin', 'skrah', 'pmpp', 'serhiy.storchaka', 'rdb', 'miss-islington', 'erlendaasland'] pr_nums = ['29035', '29991', '31201', '31527', '31528', '31539', '31668', '31669'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue45459' versions = ['Python 3.11'] ```

tiran commented 2 years ago

Currently all APIs related to Py_buffer are excluded from the limited API. It's neither possible to use Py_buffer from a C extension with limited API nor is it possible to define heap types with buffer support using the stable ABI.

The lack of Py_buffer support prevents prominent projects like NumPy or Pillow to use the limited API and produce abi3 binary wheel. To be fair it's not the only reason why these projects have not adopted the stable abi3 yet. Still Py_buffer is a necessary but not sufficient condition. Stable abi3 support would enable NumPy stack to build binary wheels that work with any Python version >= 3.11, \< 4.0.

The limited API excludes any C API that references Py_buffer:

8 PyBuffer_*() functions
21 PyBUF_* constants
PyMemoryView_FromBuffer()
PyObject_GetBuffer
Py_bf_getbuffer / Py_bf_releasebuffer type slots for PyBufferProcs

It should not be terribly complicated to add Py_buffer to the stable API. All it takes are an opaque struct definition of Py_buffer, an allocate function, a free function and a bunch of getters and setters. The hard part is to figure out which getters and setters are needed and how many struct members must be exposed by getters and setters. I recommend to get feedback from NumPy, Pillow, and Cython devs first.

Prototype ---------

typedef struct bufferinfo Py_buffer;

/* allocate a new Py_buffer object on the heap and initialize all members to NULL / 0 */
Py_buffer*
PyBuffer_New()
{
    Py_buffer *view = PyMem_Calloc(1, sizeof(Py_buffer));
    if (view == NULL) {
        PyErr_NoMemory();
    }
    return view;
}

/* convenience function */
Py_buffer*
PyBuffer_NewEx(PyObject *obj, void *buf,  Py_ssize_t len, Py_ssize_t itemsize,
               int readonly, int ndim, char *format, Py_ssize_t *shape, Py_ssize_t *strides,
               Py_ssize_t *suboffsets, void *internal)
{
    ...
}

/* release and free buffer */
void
PyBuffer_Free(Py_buffer *view)
{
    if (view != NULL) {
        PyBuffer_Release(view);
        PyMem_Free(view);
    }
}

vstinner commented 2 years ago

Py_buffer.shape requires a Py_ssize_t* pointer. It's not convenient. For example, the array module uses:

static int
array_buffer_getbuf(arrayobject *self, Py_buffer *view, int flags)
{
    ...
    if ((flags & PyBUF_ND)==PyBUF_ND) {
        view->shape = &((PyVarObject*)self)->ob_size;
    }
    ...
    return 0;
}

This code is not compatible with a fully opaque PyObject structure: https://bugs.python.org/issue39573#msg401395

serhiy-storchaka commented 2 years ago

shape is a pointer to array of Py_ssize_t of size ndim. array and memoryview do a trick to avoid memory allocation, but _testbuffer.ndarray allocates it dynamically in the heap. We can add a small static buffer in Py_buffer to avoid additional memory allocation in common cases.

tiran commented 2 years ago

IIRC shape, strides, and suboffsets are all arrays of ndims length.

We could optimize allocation if we would require users to specify the value of ndims and don't allow them to change the value afterwards. PyBuffer_New(int ndims) then would allocate view of size sizeof(Py_buffer) + (3 ndims sizeof(Py_ssize_t *)). This would give us sufficient space to memcpy() shape, strides, and suboffsets arguments into memory after the Py_buffer struct.

serhiy-storchaka commented 2 years ago

ndim is not known before calling PyObject_GetBuffer(), so we will need a new API which combines PyObject_GetBuffer() and PyBuffer_New().

tiran commented 2 years ago

CC Antoine for his expertise of the buffer protocol

Opaque Py_Buffer and PyObject structs will require a different approach and prevent some optimizations. The consumer will have to malloc() a Py_buffer struct on the heap. In non-trivial cases the producer (exporter) may have to malloc() another blob and store it in Py_buffer.internal [1]. I'm not particularly worried about the performance of malloc here.

[1] https://docs.python.org/3/c-api/buffer.html?highlight=pybuffer#c.Py_buffer.internal

serhiy-storchaka commented 2 years ago

Py_buffer is often used for handling arguments if the function supports bytes, bytearray and other bytes-like objects. For example bytes.partition(). Any additional memory allocation would add significant overhead here. bytes.join() creates Py_buffer for every item, it would be a deoptimization if it would need to allocate them all separately.

We should allow to allocate Py_buffer on stack. Currently it has too complex structure and we cannot guarantee its stability (although there were no changes for years). I propose to split Py_buffer on transparent and opaque parts and standardize the transparent structure. It should include: obj, buf, len, possible flags (to distinguish read-only from writeable) and a pointer to opaque data. For bytes, bytearray, BytesIO, mmap and most other classes the pointer to opaque data is NULL. For array and memoryview objects the opaque data could be embedded into the object.

encukou commented 2 years ago

I recommend to get feedback from NumPy, Pillow, and Cython devs first.

Could you split this into two PRs: one to add the new API, and another to add things to the limited set?

There's no rush to add it to the limited API, esp. since it can't be tested with the intended consumers yet. And it's a hassle to remove mistakes from the limited API, even in alphas/betas.

To try this out I suggest making the struct opaque when something like _PyBuffer_IS_OPAQUE is #defined. Then you can let the compiler tell you what needs to change in those projects :)

vstinner commented 2 years ago

Maybe a PEP is needed to collect usages of the Py_buffer API and check if the ABI is future proof. A PEP may help to discuss with other projects which currently consume this API.

I suggest to start with the smallest possible API and then slowly extend it. It's too easy to make mistakes :-( Once it's added to the stable ABI, it will be really hard to change it.

For example, PyBuffer.format is a "char*", but who owns the string? For a stable ABI, I would suggest to duplicate the string.

For shape, stripes and suboffsets arrays, I would also suggest to allocate these arrays on the heap people to ensure that it cannot be modified from the outside.

In your PR, PyBuffer_GetLayout() gives indirectly access to the internal Py_buffer structure members and allows to modify them. One way is to avoid this issue is to return a *copy* of these arrays.

I would prefer to require to call "Set" functions to modify a Py_buffer to ensure that a buffer always remains consistency.

PyBuffer_NewEx(PyObject *obj, void *buf, Py_ssize_t len, Py_ssize_t itemsize, int readonly, int ndim, char *format, Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t *suboffsets, void *internal)

This API looks like PyCode_New() which was broken *often* so it looks like a bad pattern for a stable ABI.

Maybe PyBuffer_New() + many Set() functions would be more future proof.

But I don't know which Py_buffer members are mandatory to have a "valid" buffer.

What if tomorrow we add new members. Will it be possible to initalize them to a reasonable default value?

tiran commented 2 years ago

All memory is owned by the exporter object. The exporter (aka producer) is the Python type that implements Py_bf_getbuffer and Py_bf_releasebuffer. In majority of cases the exporter doesn't have to set shape, strides, and suboffsets. They are used in special cases, e.g. multidimensional arrays with custom layout. For example they can be used to convert TIFF images from strides big endian format to a NumPy array in little endian format.

It's up to the exporter's Py_bf_getbuffer to decide how it fills shape, strides, and suboffsets, too. For example an exporter could allocate format, shape, strides, and suboffsets on the heap and assign pointers in its getbuffer function and store a hint in the internal field of Py_buffer. Its releasebuffer function then checks internal field and performs de-allocations. We must not copy fields. This would break the API.

It would be a bad idea to return copies in PyBuffer_GetLayout(). Consumers have to get the layout every time they access a specific item in the buffer in order to calculate the offset. I'd rather define the arguments as "const". The documentation already states that e.g. "The shape array is read-only for the consumer.".

It is highly unlikely that we will ever have to extend the Py_buffer interface. It is already extremely versatile and can encode complex formats. You can even express the layout of a TIFF image of float32 CMYK in planar configuration (one array of 32bit floats cyan, followed by an array of magenta, then an array of yellow, and finally an array of contrast).

PS: I have removed PyBuffer_NewEx() function. It did not make sense any sense.

tiran commented 2 years ago

A consumer will use the APIs:

---

Py_buffer *view;
int ndim;
const char *format;
const Py_ssize_t *shape, *strides, *suboffsets;
void *buf;

view = PyBuffer_New();
PyObject_GetBuffer(obj, view, flags);
ndim = PyBuffer_GetLayout(&format, &shape, &strides, &suboffsets);
buf = PyBuffer_GetPointer(view, [...]);
PyBuffer_Free(view); // also calls PyBuffer_Release()

The API functions PyBuffer_FillInfo(), PyBuffer_FillInfoEx(), and PyBuffer_GetInternal() are for exporters (producers)-only. The exporter uses the PyBuffer_FillInfo*() in its Py_bf_getbuffer function to fill the view. It may use PyBuffer_GetInternal() in its Py_bf_releasebuffer function to access the internal field and to release additional resources.

serhiy-storchaka commented 2 years ago

I do not like requirement to allocate Py_buffer on the heap. It adds an overhead. Common case in CPython code is:

Py_buffer view;
void *buf;
Py_ssize_t len;

PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE);
buf = view.buf;
len = view.len;
// no other fields are used
PyBuffer_Release(&view);

And I want to keep it as simple and efficient as it can be.

tiran commented 2 years ago

CPython internals can still use allocation on the stack. Only stable ABI extensions have to use allocation on the heap.

serhiy-storchaka commented 2 years ago

That would be an unfair advantage. If we want people to use the limited API we should not make it much slower than the non-limited API.

encukou commented 2 years ago

No. Limited API is generally not as performant as the non-limited one. It is even documented this way: https://docs.python.org/3/c-api/stable.html#limited-api-scope-and-performance

We should not make it *much* slower, but code that can take advantage of implementation details can always be faster. Max speed should not be a concern in the limited API.

pitrou commented 2 years ago

Py_buffer *is* an ABI, and it hasn't changed from the start. Of course you can still try to collect feedback from third-party projects, but there is a very high probability that it won't need to change in the near future.

pitrou commented 2 years ago

I would advocate:

expose the Py_buffer struct fully
expose the various PyBUF_* constants
expose at least PyObject_GetBuffer() and PyBuffer_Release()

The rest is optional.

alex commented 2 years ago

I am someone who is interested in having this, but FWIW my motivation is slightly more narrow, I only really need abi3-friendly buffer support with contiguous 1d buffers. Not sure if there'd be interest in doing a smaller version before figuring out the entire Py_buffer API.

encukou commented 2 years ago

Antoine has a good point. We can freeze the Py_buffer struct. If it needs to be extended in the future, it'll need a new set of functions and names -- and perhaps a versioning scheme. We'll know more about the problem when/if it comes up.

tiran commented 2 years ago

After some consideration I also agree with Antoine. The Py_buffer API has been around for a long time without any changes to the Py_buffer struct. It is unlikely that the struct will ever change.

I have created a new PR that exposes Pybuffer struct, PyBuffer() API functions, PyBUF_ constants, Pybf type slots, and PyMemoryView_FromBuffer(). We could consider to export PyPickleBuffer() API, too.

vstinner commented 2 years ago

Would it make sense to add a "version" member to the structure?

It would allow to support an old stable structure for the stable ABI and a new structure with other changes. The problem is how to initalize the version member. On Windows, many structures have a member which is the size of the structure.

I tried to implement such ABI compatibility for PEP-587 PyConfig structure, but the idea was rejected (there was no need for a stable ABI when Python is embedded in an application): https://mail.python.org/archives/list/python-dev@python.org/thread/C7Z2NA2DTM3DLOZCFQAK5A2WFYO3PHHX/

tiran commented 2 years ago

I thought of a version field, too. In the end it is going to cause more work and trouble than it would benefit us.

Stack-allocated Py_buffer's are typically initialized with

   Py_buffer data = {NULL, NULL};

. The code initializes Py_buffer.buf and Py_buffer.obj as NULL. The remaining fields are whatever random values happens to be on the C stack. If we would append a version field to the struct, than every project would have to initialize the field properly.

vstinner commented 2 years ago

In Python 3.5, I decided to rename the public "PyMemAllocator" structure to PyMemAllocatorEx when I added a new "calloc" member. C extensions using "PyMemAllocator" fail to build to force developers to set the calloc member.

IMO it's unfortunate to have to rename a structure to force developers to update their C code :-(

tiran commented 2 years ago

The Py_buffer struct has stayed the same for over a decade and since Python 2.6.0 and 3.0.0. It is unlikely that it has to be changed in the near future.

encukou commented 2 years ago

The current struct is also likely to continue covering most future uses. If we decide to add PyBufferEx functions but continue providing the current ones (with the current struct), most users won't be affected. (But it'll be a bit more work for us than throwing the old API out entirely.)

vstinner commented 2 years ago

Py_buffer data = {NULL, NULL}; The code initializes Py_buffer.buf and Py_buffer.obj as NULL. The remaining fields are whatever random values happens to be on the C stack.

The C language sets other members to 0/NULL with this syntax, no?

tiran commented 2 years ago

The C language sets other members to 0/NULL with this syntax, no?

No, they are not set to 0/NULL. https://en.wikipedia.org/wiki/Uninitialized_variable

vstinner commented 2 years ago

Example: ---

struct Point { int x; int y; int z; };

int main()
{
    struct Point p = {1};
    return p.y;
}

gcc -O0 produces this machine code which sets p.y to 0 and p.z to 0: --- Dump of assembler code for function main: 0x0000000000401106 \<+0>: push rbp 0x0000000000401107 \<+1>: mov rbp,rsp 0x000000000040110a \<+4>: mov QWORD PTR [rbp-0xc],0x0 0x0000000000401112 \<+12>: mov DWORD PTR [rbp-0x4],0x0 0x0000000000401119 \<+19>: mov DWORD PTR [rbp-0xc],0x1 0x0000000000401120 \<+26>: mov eax,DWORD PTR [rbp-0x8] 0x0000000000401123 \<+29>: pop rbp 0x0000000000401124 \<+30>: ret
---

gcc -O3 heavily optimize the code, it always return 0, it doesn't return a random value from the stack: --- (gdb) disassemble main Dump of assembler code for function main: 0x0000000000401020 \<+0>: xor eax,eax 0x0000000000401022 \<+2>: ret
---

The "C99 Standard 6.7.8.21" says:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

The C99 standard says that p.y and p.z must be set to 0.

I'm talking about the specific C syntax of a structure static initialization: "struct MyStruct x = {...};".

If "Py_buffer data = {NULL, NULL};" is allocated on the stack, all "data" Py_buffer members are set to 0 or NULL:

typedef struct bufferinfo {
    void *buf;
    PyObject *obj;        /* owned reference */
    Py_ssize_t len;
    Py_ssize_t itemsize;  /* This is Py_ssize_t so it can be
                             pointed to by strides in simple case.*/
    int readonly;
    int ndim;
    char *format;
    Py_ssize_t *shape;
    Py_ssize_t *strides;
    Py_ssize_t *suboffsets;
    void *internal;
} Py_buffer;

If we want to add a version member to this structure, I would suggest to enforce the usage of a static initialization macro or an initialization function, like: "Py_buffer data; PyBuffer_Init(&data);" or "Py_buffer data = PyBuffer_STATIC_INIT;"

The problem of the macro is that it is not usable on Python extensions was are not written in C or C++ (or more generally to extensions which cannot use macros).

--

A different approach is to use an API which allocates a Py_buffer on the heap memory, so if the structure becomes larger tomorrow, an old built C extensions continues to work:

Py_buffer *data = PyBuffer_New();
// ... use *data ...
PyBuffer_Free(data);

PyBuffer_New() can initialize the version member and allocates the proper memory block size.

The best is if the "... use *data ..." part is only done with function calls :-)

tiran commented 2 years ago

Thanks for the investigation. I didn't know about C99 Standard 6.7.8.21. That's a useful and sensible extension to the language.

In my opinion it is neither useful to extend the Py_buffer struct with a version tag nor to force users to allocate the struct on the heap. The current design has worked for over 13 years. Any deviation from the established design poses a risk to break 3rd party software.

I could be convinced to add PyBuffer_New() and PyBuffer_Free() as additional feature, but their use should be optional. The function were part of my first PR python/cpython#73221.

miss-islington commented 2 years ago

New changeset f66c857572a308822c70fd25e0197b6e0dec6e34 by Christian Heimes in branch 'main': bpo-45459: Add Py_buffer to limited API (GH-29991) https://github.com/python/cpython/commit/f66c857572a308822c70fd25e0197b6e0dec6e34

tiran commented 2 years ago

Thanks for the review, Petr!

1f4f0d85-e63f-4322-8b54-768e60e1d01b commented 2 years ago

There's some side effects with "buffer.h" inclusion in Panda3D when building againt 3.11a5, project manager concerns are here https://github.com/python/cpython/pull/29991#issuecomment-1031731100

vstinner commented 2 years ago

There's some side effects with "buffer.h" inclusion in Panda3D when building againt 3.11a5, project manager concerns are here https://github.com/python/cpython/pull/29991#issuecomment-1031731100

Copy of rdb's message: """ This change broke our project build because when cpython/object.h is including buffer.h it is forcing it to resolve along the search path, and the compiler is hitting the buffer.h in our project rather than the one in the Python include directory.

Should it not be using a relative include, ie. #include "../buffer.h" ? I think otherwise this change will cause breakage for many projects given how common the header name "buffer.h" may be. """

In Python.h, buffer.h is included before object.h. But object.h includes buffer.h. I suggest to include buffer.h before object.h and remove #include "buffer.h" from Include/cpython/buffer.h.

Also, I agree that renaming buffer.h to pybuffer.h would reduce issues like that. Moreover, this header file exposes the "Py_buffer" API, so "pybuffer.h" sounds like a better name ;-)

vstinner commented 2 years ago

New changeset 66b3cd7063322a9f5c922a97bbd06fdb98309999 by Victor Stinner in branch 'main': bpo-45459: Rename buffer.h to pybuffer.h (bpo-31201) https://github.com/python/cpython/commit/66b3cd7063322a9f5c922a97bbd06fdb98309999

vstinner commented 2 years ago

pmp-p:

There's some side effects with "buffer.h" inclusion in Panda3D when building againt 3.11a5, project manager concerns are here https://github.com/python/cpython/pull/29991#issuecomment-1031731100

Thanks for the report. It has been fixed. I close again the issue.

benjaminp commented 2 years ago

clang doesn't like the typedef forward-decl:

In file included from ../cpython/Modules/_ctypes/_ctypes.c:108:
In file included from ../cpython/Include/Python.h:43:
../cpython/Include/object.h:109:3: warning: redefinition of typedef 'PyObject' is a C11 feature [-Wtypedef-redefinition]
} PyObject;
  ^
../cpython/Include/pybuffer.h:23:24: note: previous definition is here
typedef struct _object PyObject;
                       ^
1 warning generated.

vstinner commented 2 years ago

Include/object.h:109:3: warning: redefinition of typedef 'PyObject' is a C11 feature [-Wtypedef-redefinition]

Oh. I already met this error :-(

That's why I proposed in python/cpython#75384 to move all forward declarations at the top of Python.h to solve such issue.

I wrote python/cpython#75708 to do exactly that: add a new pytypedefs.h header files to move all forward declarations at the top of Python.h.

I didn't move *all* "typedef struct xxx yyy;" there: only the ones which cause interdependencies issues.

vstinner commented 2 years ago

New changeset ec091bd47e2f968b0d1631b9a8104283a7beeb1b by Victor Stinner in branch 'main': bpo-45459: Add pytypedefs.h header file (GH-31527) https://github.com/python/cpython/commit/ec091bd47e2f968b0d1631b9a8104283a7beeb1b

vstinner commented 2 years ago

New changeset 042f31da552c19054acd3ef7bb6cfd857bce172b by Victor Stinner in branch 'main': bpo-45459: C API uses type names rather than structure names (GH-31528) https://github.com/python/cpython/commit/042f31da552c19054acd3ef7bb6cfd857bce172b

vstinner commented 2 years ago

I close again the issue, the C API should now be fine :-)

vstinner commented 2 years ago

New changeset 0b63215bb152c06404cecbd5303b1a50969a9f9f by Victor Stinner in branch 'main': bpo-45459: Fix PyModuleDef_Slot type in the limited C API (GH-31668) https://github.com/python/cpython/commit/0b63215bb152c06404cecbd5303b1a50969a9f9f

vstinner commented 2 years ago

New changeset 32f0c8271706550096c454eb512450b85fbfc320 by Victor Stinner in branch 'main': bpo-45459: Use type names in the internal C API (GH-31669) https://github.com/python/cpython/commit/32f0c8271706550096c454eb512450b85fbfc320

python / cpython

Limited API support for Py_buffer #89622