wjakob / nanobind

nanobind: tiny and efficient C++/Python bindings
BSD 3-Clause "New" or "Revised" License
2.31k stars 194 forks source link

[Recommendation]: Don't mess around with builtins, that's what modules are for :) #96

Closed rwgk closed 1 year ago

rwgk commented 1 year ago

Problem description

The title is by @yhg1s, I'm just the messenger here.

We happened to stumble over this in connection with pybind11, then I looked in nanobind and see that's also adding capsules to builtins:

https://github.com/wjakob/nanobind/blob/61393ad3ce3bc68d195a1496422df43d5fb45ec0/src/nb_internals.cpp#L379

Thomas wrote:

Reproducible example code

No response

wjakob commented 1 year ago

Thanks for for reaching out @Yhg1s and @rwgk. Indeed, it would be nice to do this in a better way!

Just for context, what is this good for? nanobind and pybind11 maintain some internal data structures about the bindings. Some complex binding projects are split up across multiple extension modules, and in that case those internal data structures must be shared.

So this ugly capsule in PyEval_GetBuiltins() is so that the separate pybind11/nanobind extensions can find each other and exchange information.

It's actually kind of undesirable that this is even visible at all. The best solution would be if there is some "secret" dictionary only accessible by CPython (but not pure Python) code, where such things can be stashed.

The problem I see with sys.modules is that the user could simply delete the capsule from there in pure python code, which could trigger serious problems. (Perhaps builtins() has the same issue, I am actually not sure).

@Yhg1s, do you have any suggestions?

JelleZijlstra commented 1 year ago

For what it's worth, users can also delete things from builtins (with predictably catastrophic effects):

In [17]: del sys.modules["builtins"].print
Error in sys.excepthook:
Traceback (most recent call last):
  File "/main_instance_shell/jelle/venv3.9/lib/python3.9/site-packages/IPython/core/application.py", line 288, in excepthook
    return crashhandler.crash_handler_lite(etype, evalue, tb)
  File "/main_instance_shell/jelle/venv3.9/lib/python3.9/site-packages/IPython/core/crashhandler.py", line 227, in crash_handler_lite
    traceback.print_exception(etype, evalue, tb)
  File "/main_instance_shell/jelle/venv3.9/lib/python3.9/traceback.py", line 105, in print_exception
    print(line, file=file, end="")
NameError: name 'print' is not defined
wjakob commented 1 year ago

@Yhg1s, @JelleZijlstra: What about the Py_LIMITED_API function PyInterpreterState_GetDict()? (docs link: https://docs.python.org/3/c-api/init.html#c.PyInterpreterState_GetDict). Would that be an acceptable data structure for this purpose?

(FWIW the documentation of this function also mentions PyModule_GetState, but that one doesn't seem to be an option for us. The capsule is for inter-module communication and must be stored globally somewhere.)

wjakob commented 1 year ago

(By the way: @rwgk -- this may be an interesting coordinated switch to also do in pybind11 along with the internals ABI bump)

Yhg1s commented 1 year ago

There's nothing that is usable by an extension module, provided by Python, that isn't also usable by other modules. That includes builtins. (The builtins you're currently setting are even easier to access than the separate module I suggested, since you don't even need to import anything :)

If you want some way to control who can change things, you will need to add access control yourself. A separate module with accessor functions may be what you're looking for there, but how you're going to prevent people from using it incorrectly, I don't know.

wjakob commented 1 year ago

Is it possible to access the dictionary PyInterpreterState_GetDict() from pure Python code? (I don't see how barring some use of ctypes.)

wjakob commented 1 year ago

I was looking for where this API came from originally, and it seems to serve a related purpose in the cffi module: https://github.com/python/cpython/issues/80305

JelleZijlstra commented 1 year ago

PyInterpreterState_GetDict does look like a good fit for your use case, but I'm not a specialist in this area.

Yhg1s commented 1 year ago

PyInterpreterState_GetDict() gets you a regular dict, which you can always get access to from Python code. (If nothing else, through the gc module.) What I'm trying to say is that you cannot protect against users intentionally messing with these internals, regardless of whether it's in builtins, a regular module or a special object in sys.modules, PyInterpreterState_GetDict or some other storage. Not even from Python code.

wjakob commented 1 year ago

What I'm trying to say is that you cannot protect against users intentionally messing with these internals, regardless of whether it's in builtins, a regular module or a special object in sys.modules, PyInterpreterState_GetDict or some other storage.

Okay, that makes sense. I am not trying to protect against such behavior. I moved storage of the internal capsule from builtins to the interpreter state dictionary (PyInterpreterState_GetDict()) in commit https://github.com/wjakob/nanobind/commit/ca23da72ce71a45318f1e59474c9c2906fce5154.

The previous location (PyEval_GetBuiltins()) problematic for the reasons pointed out above. Automatically exporting this implementation detail to every module is also kind of horrible. The reason for not sticking the capsule into sys.modules is because this inter-module scratch space does not correspond to the notion of a module. It would seem like a misuse of that data structure.

PyInterpreterState_GetDict() is nice because it is not directly exposed to the user. Somebody malicious user could still try to get hold of this dictionary and break things, but we are not trying to protect against such an adversary.

For posterity, some answers to the question "what could possibly go wrong?":

markshannon commented 1 year ago

From a performance perspective (ignoring code quality issues), it is fine to add values to builtins, provided that:

Python 3.11 is fairly forgiving of changing builtins later, but future versions of CPython won't be, and neither is PyPy.