Open pablogsal opened 3 weeks ago
I think we can fix the performance issues by raising the level at which allocation/free goes through a function pointer.
Instead of a malloc-like interface void *malloc(size_t size)
, we should be returning partially initialized objects.
PyObject *obj_malloc(PyTypeObject *tp, size_t size, size_t presize)
would allocate a chunk of memory size + presize
, returning a PyObject *
pointing to that memory + presize
, with the ob_type
field set to tp
and the ob_refcount
set to one.
This is low-enough level to be fully general, but with enough context to support tracemalloc.
I think we would need the following implementations, switchable at runtime:
We don't need (or want) to switch between the free-threading and default allocators, but it keeps the rest of the code simpler if they have the same interface.
In https://github.com/python/cpython/issues/125703 @markshannon has raised that he is unhappy about the performance implications of where these hooks are placed and in a call we discussed that he has some ideas on how to make them more performant by moving them elsewhere or adapting then.
I am opening this issue to track and sync about these improvements for 3.14 and beyond.