feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

BPO	43313
Nosy	@nascheme, @vstinner, @methane, @JunyiXie
PRs	python/cpython#24857

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['expert-subinterpreters', '3.10'] title = 'feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state' updated_at = user = 'https://github.com/JunyiXie' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = False closed_date = None closer = None components = ['Subinterpreters'] creation = creator = 'JunyiXie' dependencies = [] files = [] hgrepos = [] issue_num = 43313 keywords = ['patch'] message_count = 10.0 messages = ['387614', '388668', '388670', '388671', '388734', '388735', '388736', '388737', '388743', '388745'] nosy_count = 4.0 nosy_names = ['nascheme', 'vstinner', 'methane', 'JunyiXie'] pr_nums = ['24857'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue43313' versions = ['Python 3.10'] ```

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

https://github.com/ericsnowcurrently/multi-core-python/issues/73

https://github.com/JunyiXie/cpython/commit/820954879fd546fcb29b654d10c424bd47da70ce changes: move pymalloc state in obmalloc.h _is add pymalloc_state pymalloc_allocxx api use subinterpreter pymalloc_state

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

Made two changes:

support pymalloc for subinterpreters. each subinterpreter has pymalloc_state
_copy_raw_string api alloc memory use PyMem_RawFree and PyMem_RawMalloc.

I extend _xxsubinterpretermodule.c to support call any function in sub interpreter. when i need return result from sub interpreter call.

i need create item->name in shared item. will use pymem_xxx api to manage memory. when with_pymalloc macro defined, it will create memory and bound to interpreter(iterp1) pymalloc state.
after switch interpreter state, now in iterp2 state, get return value from shareditem, and i need free shared item. but item->name memory managed by interp1 pymalloc state. if i want to free them, i need switch to interpreter state 1. it's complicated. to implementation it, we need save interpid in shared item.

so i think, in _sharednsitem_init _copy_raw_string, need malloc by PyMem_RawAPI. easy to management.

static int
_sharednsitem_init(struct _sharednsitem *item, PyObject *key, PyObject *value)
{
    item->name = _copy_raw_string(key);

_sharedns *result_shread = _sharedns_new(1);

#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
    // Switch to interpreter.
    PyThreadState *new_tstate = PyInterpreterState_ThreadHead(interp);
    PyThreadState *save1 = PyEval_SaveThread();

    (void)PyThreadState_Swap(new_tstate);
#else
    // Switch to interpreter.
    PyThreadState *save_tstate = NULL;
    if (interp != PyInterpreterState_Get()) {
        // XXX Using the "head" thread isn't strictly correct.
        PyThreadState *tstate = PyInterpreterState_ThreadHead(interp);
        // XXX Possible GILState issues?
        save_tstate = PyThreadState_Swap(tstate);
    }
#endif

    PyObject *module = PyImport_ImportModule(PyUnicode_AsUTF8(module_name));
    PyObject *function = PyObject_GetAttr(module, function_name);

    result = PyObject_Call(function, args, kwargs);

    if (result == NULL) {
        // exception handler
        ...
    }

    if (result && _sharednsitem_init(&result_shread->items[0], PyUnicode_FromString("result"), result) != 0) {
        PyErr_Format(RunFailedError, "interp_call_function result convert to shared failed");
        return NULL;;
    }
#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
    // Switch back.
    PyEval_RestoreThread(save1);
#else
    // Switch back.
    if (save_tstate != NULL) {
        PyThreadState_Swap(save_tstate);
    }
#endif
    // ...

    if (result) {
        result = _PyCrossInterpreterData_NewObject(&result_shread->items[0].data);
        _sharedns_free(result_shread);
    }

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

github pr

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

https://github.com/python/cpython/pull/24857

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

There is a problem: if we bound pymalloc state with a interpreter. malloc pointer in interpreterA and free pointer is usual.

it's cause a problem. when we use PyObject_Free,

we look up address in pymalloc pool.
if not find, current code will call PyMem_RawFree(p) to free. it will cause crash.(address is pymalloc_alloc from another interpreter)

I think it has two way to slove this problem:

free/alloc memory in one interpreter. Frequent switch interpreter affects performance
when free memory address, find this address in all interpreter pymalloc pool. and free it.(but it need add lock to pymalloc)

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

malloc pointer in interpreterA and free pointer is usual.

malloc pointer in interpreterA and free pointer in interpreterB is usual.

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

by the way, There is no operation to destroy the memory pool in the cpython code. Repeated creation of the pymalloc pool will cause memory leaks.

7aa09af7-4256-46da-93fc-f6eef717c259 commented 3 years ago

when free memory address, find this address in all interpreter pymalloc pool. and free it.(but it need add lock to pymalloc)

when finalize_interp_delete, we need keep interpreter pymalloc pool in linked list.It will be used when search memory in pymalloc pools.

vstinner commented 3 years ago

I'm not sure that it's needed to have a "per interpreter" allocator. The needed feature is to be able to call PyMem_Malloc() in parallel in different threads. If I understood correctly, the glibc malloc has a per-thread fast allocator (no locking) and then falls back to a slow allocator (locking) if the fast allocator failed. Maybe pymalloc could have per-thread memory arenas.

When I implemented the PEP-587, I spend a significant amount of time to avoid using pymalloc before Py_Initialize() is called: only use PyMem_RawMalloc() before Py_Initialize().

But I'm not 100% sure that pymalloc is not used before Py_Initialize() nor *after* Py_Finalize(). For example, we should check if a daemon thread can call PyMem_Malloc() after Py_Finalize(), even if they are supposed to exit as soon as they try to acquire the GIL, even the GIL must be held to use pymalloc (to use PyMem_Malloc and PyObject_Malloc): https://docs.python.org/dev/c-api/memory.html#memory-interface

See also bpo-37448: "Add radix tree implementation for obmalloc address_in_range()" https://github.com/python/cpython/pull/14474

vstinner commented 3 years ago

The current workaround is to disable pymalloc when Python is built with EXPERIMENTAL_ISOLATED_SUBINTERPRETERS:

_PyPreConfig_InitCompatConfig(PyPreConfig *config):

#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
    /* bpo-40512: pymalloc is not compatible with subinterpreters,
       force usage of libc malloc() which is thread-safe. */
#ifdef Py_DEBUG
    config->allocator = PYMEM_ALLOCATOR_MALLOC_DEBUG;
#else
    config->allocator = PYMEM_ALLOCATOR_MALLOC;
#endif
#else
    ...
#endif

python / cpython

feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state #87479