hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.02k stars 52 forks source link

Provide API for module state #286

Open steve-s opened 2 years ago

steve-s commented 2 years ago

The current state in CPython and its issues:

The issue with the last API is that one can import one module multiple times in one interpreter:

import mymod
backup = mymod
del sys.modules['mymod']
import mymod
mymod.MyType is backup.MyType # False

In such case PyModuleDef does not uniquely identify the module instance. An example that shows how this can lead to a bug is here: https://gist.github.com/steve-s/9dd4d7e4810bf4cb61302f049d03ecf5

CPython documentation: https://docs.python.org/3/howto/isolating-extensions.html

There is also a Discord conversation about this with CPython core.


Ideal API:

How implementation of such API can look like:

steve-s commented 1 year ago

Idea for new/better approach:

Example usage:

HPyDef_STATE_METH(myabs, "myabs", module_state_t, HPyFunc_NOARGS)
HPy myabs_impl(HPyContext *ctx, module_state_t *state, HPy self) { ... }

// Slots:
HPyDef_STATE_SLOT(myadd, module_state_t, HPy_nb_add)
HPy myadd_impl(HPyContext *ctx, module_state_t *state, HPy self, HPy other) { ... }

sketch of generated code for a method:

// prototype to be implemented by the user:
HPy myabs_impl(HPyContext *ctx, module_state_t *state, HPy self);

// simple trampoline that Python interpreter should call
HPy myabs_state_trampoline(HPyContext *ctx, void *state, HPy self) {
    return myabs_impl(ctx, (module_state_t*) state, self);
}

// CPython trampoline if this is attached to a module
PyObject* myabs_trampoline(PyObject* self) {
    return myabs_state_trampoline(ctx, PyModule_GetState(self), _py2h(self));
}

// CPython trampoline if this is attached to type
PyObject* myabs_type_trampoline(PyObject* self, PyTypeObject *defining_class) {
    return myabs_state_trampoline(ctx, PyType_GetModuleState(defining_class), _py2h(self));
}

sketch of generated code for a slot:

// prototype to be implemented by the user:
HPy myadd_impl(HPyContext *ctx, module_state_t *state, HPy self, HPy other);

// simple trampoline that Python interpreter should call
HPy myadd_state_impl(HPyContext *ctx, void *state, HPy self, HPy other) {
    return myadd_impl(ctx, (module_state_t*) state, self, other);
}

HPyDef myadd = { ...
   void* _data; // reserved for the runtime, will be filled in HPyType_FromSpec
}

// CPython trampoline for the slot
PyObject* myadd_trampoline(PyObject* self, PyObject* other) {
    PyObject *mod = PyType_GetModuleByDef(Py_TYPE(self), (PyModuleDef*) myadd._data);
    return myadd_state_impl(ctx, PyModule_GetState(mod), _py2h(self), _py2h(other));
}

Relevant CPython docs:

Suggestions for better name for the macros? We cannot overload HPyDef_METH because it takes varargs. We can overload HPyDef_SLOT, but both macros should be consistent ideally.

Additionally, we either need to add HPyType_FromSpecAndModule, or, which I would be inclined to, HPyType_FromSpec will always take a module (we need to stash its def to _data in the generated code above).

Open question: I would suggest not exposing any further API to access the state. If you need it, use appropriate convention. I think the more is declarative and carried out the Python interpreter, the better. Like the removal of HPyModule_Create with the multi-phase init.