pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.11k stars 2.05k forks source link

fix: Replace bare `static exception<T>` with `gil_safe_call_once_and_store`. #4897

Closed rwgk closed 8 months ago

rwgk commented 8 months ago

Description

This is to ensure that Py_DECREF() is not called after the Python interpreter was finalized already:

https://github.com/pybind/pybind11/blob/3414c56b6c7c521d868c9a137ca2ace2e26b5b2e/include/pybind11/gil_safe_call_once.h#L19

(Bug noticed by chance while working on #4888.)

Suggested changelog entry:

Removes potential for Undefined Behavior during process teardown.
rwgk commented 8 months ago

Google global testing ID (passed): OCL:575710443:BASE:575809221:1698070476323:98dd25a0 (using this PR @ ca7bdac243d04201ce922bee3db22620c10534ff)

rwgk commented 8 months ago

@tkoeppe FYI

I never got to systematically looking if there are more bugs like this BTW.

rwgk commented 8 months ago

Thanks @EthanSteinberg!

I bet we could set up some global variables to track this / throw assertions.

I don't know how we could do that (going completely blank tbh). Just disclosing & an indication of curiosity, but ...

Not sure if it would be worth the effort though.

... yeah, I'm assuming there is only very little left to be had, simply based on how widely used pybind11 is.

I'm slightly surprised that nobody needed this fix before. I'm guessing this bug bites very rarely, and only during process teardown, so it's just some occasional flakiness maybe that's easy to ignore.

EthanSteinberg commented 8 months ago

I don't know how we could do that (going completely blank tbh). Just disclosing & an indication of curiosity, but ...

I don't know exactly how to set this up (would need to do more research), but my initial guess would be to have a boolean flag that is flipped true by https://docs.python.org/3/library/atexit.html#atexit.register.

We could then flag on any calls to Py_DECREF or something of that nature ...

The timing here is really tricky. I'm not sure atexit would be at the point we need.

rwgk commented 8 months ago

The timing here is really tricky. I'm not sure atexit would be at the point we need.

I'm thinking, let's not go there (too uncertain, not a lot to gain for it).

But something else crossed my mind (~randomly): How does this static play with embed.h, specifically multiple calls to py::initialize_interpreter() - py::finalize_interpreter() or py::scoped_interpreter guard{}?

Is the stored Exception type still valid after py::finalize_interpreter()?

tkoeppe commented 8 months ago

I don' t know how the CPython API works for multiple interpreters, but just thinking about this on a high level, none of our static approaches can work with multiple interpreters. Instead, the "once" guard would have to be part of the interpreter state, and the once-initializers would need to work on that per-interpreter state, not globally.

rwgk commented 8 months ago

Thanks Thomas for confirming what I was suspecting! — Sounds like we (someone...) should make this warning much stronger:

https://github.com/pybind/pybind11/blob/fa27d2fd439cd84a749e2759c0dd21529fac50a4/include/pybind11/embed.h#L235-L242