Closed rwgk closed 1 year ago
Thanks for for reaching out @Yhg1s and @rwgk. Indeed, it would be nice to do this in a better way!
Just for context, what is this good for? nanobind and pybind11 maintain some internal data structures about the bindings. Some complex binding projects are split up across multiple extension modules, and in that case those internal data structures must be shared.
So this ugly capsule in PyEval_GetBuiltins()
is so that the separate pybind11/nanobind extensions can find each other and exchange information.
It's actually kind of undesirable that this is even visible at all. The best solution would be if there is some "secret" dictionary only accessible by CPython (but not pure Python) code, where such things can be stashed.
The problem I see with sys.modules
is that the user could simply delete the capsule from there in pure python code, which could trigger serious problems. (Perhaps builtins() has the same issue, I am actually not sure).
@Yhg1s, do you have any suggestions?
For what it's worth, users can also delete things from builtins (with predictably catastrophic effects):
In [17]: del sys.modules["builtins"].print
Error in sys.excepthook:
Traceback (most recent call last):
File "/main_instance_shell/jelle/venv3.9/lib/python3.9/site-packages/IPython/core/application.py", line 288, in excepthook
return crashhandler.crash_handler_lite(etype, evalue, tb)
File "/main_instance_shell/jelle/venv3.9/lib/python3.9/site-packages/IPython/core/crashhandler.py", line 227, in crash_handler_lite
traceback.print_exception(etype, evalue, tb)
File "/main_instance_shell/jelle/venv3.9/lib/python3.9/traceback.py", line 105, in print_exception
print(line, file=file, end="")
NameError: name 'print' is not defined
@Yhg1s, @JelleZijlstra: What about the Py_LIMITED_API function PyInterpreterState_GetDict()
? (docs link: https://docs.python.org/3/c-api/init.html#c.PyInterpreterState_GetDict). Would that be an acceptable data structure for this purpose?
(FWIW the documentation of this function also mentions PyModule_GetState
, but that one doesn't seem to be an option for us. The capsule is for inter-module communication and must be stored globally somewhere.)
(By the way: @rwgk -- this may be an interesting coordinated switch to also do in pybind11 along with the internals ABI bump)
There's nothing that is usable by an extension module, provided by Python, that isn't also usable by other modules. That includes builtins. (The builtins you're currently setting are even easier to access than the separate module I suggested, since you don't even need to import anything :)
If you want some way to control who can change things, you will need to add access control yourself. A separate module with accessor functions may be what you're looking for there, but how you're going to prevent people from using it incorrectly, I don't know.
Is it possible to access the dictionary PyInterpreterState_GetDict()
from pure Python code? (I don't see how barring some use of ctypes.)
I was looking for where this API came from originally, and it seems to serve a related purpose in the cffi
module: https://github.com/python/cpython/issues/80305
PyInterpreterState_GetDict
does look like a good fit for your use case, but I'm not a specialist in this area.
PyInterpreterState_GetDict()
gets you a regular dict, which you can always get access to from Python code. (If nothing else, through the gc
module.) What I'm trying to say is that you cannot protect against users intentionally messing with these internals, regardless of whether it's in builtins
, a regular module or a special object in sys.modules
, PyInterpreterState_GetDict
or some other storage. Not even from Python code.
What I'm trying to say is that you cannot protect against users intentionally messing with these internals, regardless of whether it's in builtins, a regular module or a special object in sys.modules, PyInterpreterState_GetDict or some other storage.
Okay, that makes sense. I am not trying to protect against such behavior. I moved storage of the internal capsule from builtins
to the interpreter state dictionary (PyInterpreterState_GetDict()
) in commit https://github.com/wjakob/nanobind/commit/ca23da72ce71a45318f1e59474c9c2906fce5154.
The previous location (PyEval_GetBuiltins()
) problematic for the reasons pointed out above. Automatically exporting this implementation detail to every module is also kind of horrible. The reason for not sticking the capsule into sys.modules
is because this inter-module scratch space does not correspond to the notion of a module. It would seem like a misuse of that data structure.
PyInterpreterState_GetDict()
is nice because it is not directly exposed to the user. Somebody malicious user could still try to get hold of this dictionary and break things, but we are not trying to protect against such an adversary.
For posterity, some answers to the question "what could possibly go wrong?":
gc
to find and mess up the dictionary in question, the consequence is that calls to bound functions involving types that span across multiple extension libraries fail with TypeError
. __nb_internals_v5_gcc_libstdcpp__
) to the capsule name attribute. nanobind won't decode a capsule if it has a different name. Note that this is mostly protection against unforeseen bugs rather than a malicious user. Somebody could still, e.g., use ctypes
to create a capsule themselves which has that name and references some unrelated process memory. But they can already break things in arbitrary ways at that point without using nanobind.From a performance perspective (ignoring code quality issues), it is fine to add values to builtins
, provided that:
Python 3.11 is fairly forgiving of changing builtins
later, but future versions of CPython won't be, and neither is PyPy.
Problem description
The title is by @yhg1s, I'm just the messenger here.
We happened to stumble over this in connection with pybind11, then I looked in nanobind and see that's also adding capsules to
builtins
:https://github.com/wjakob/nanobind/blob/61393ad3ce3bc68d195a1496422df43d5fb45ec0/src/nb_internals.cpp#L379
Thomas wrote:
Reproducible example code
No response