Closed jamadden closed 7 months ago
Thank you for reporting this. I will have to read up on how to correctly use the GIL. It seems that pybind11 is doing some related things. I'm not using pybind11. Maybe I could use it or if not, it's still nice to have a working example. All of this needs to be done inside src/core/filereader/Python.hpp
, which should be the only piece of code that calls into Python.
In a nutshell, you call PyGILState_Ensure()
before calling into Python from an arbitrary thread, and you call PyGILState_Release()
when you're done calling into Python.
My gevent library (written in C) uses some macros. A function that could be called from anywhere, anytime, looks like this:
#define GGIL_DECLARE PyGILState_STATE ___save
#define GGIL_ENSURE ___save = PyGILState_Ensure();
#define GGIL_RELEASE PyGILState_Release(___save);
static void some_function()
{
GGIL_DECLARE;
int i; /* other variable declarations; we were trying to be C89 compliant */
/* Now the body of the function might use Python APIs, so ensure we have the GIL. */
GGIL_ENSURE;
/* now do stuff */
GGIL_RELEASE;
}
But since this project is in C++, it has the option of using the RAII pattern and declaring an instance of a custom class that "ensures" in the constructor and "releases" in the destructor.
One small note of caution: Obtaining the GIL can be expensive, so, whenever possible, it's best to do it at a relatively high level, instead of in every function that calls the Python API.
That is, instead of this:
// assuming this is the RAII class
class EnsureGIL;
static void func1()
{
EnsureGIL g;
// do stuff
}
static void func2()
{
EnsureGIL g;
// do stuff
}
static void driver()
{
func1()
func2()
}
it would be better to have the driver function handle the GIL.
static void func1()
{
// do stuff
}
static void func2()
{
// do stuff
}
static void driver()
{
EnsureGIL g;
func1()
func2()
}
That can be easier said than done, though, unfortunately.
That can be easier said than done, though, unfortunately.
Like always...
I tried around a bit and can reproduce it thanks to your detailed bug report. I also have a simple RAII helper that looks like this:
class ScopedGILLock
{
public:
ScopedGILLock()
{
if ( m_referenceCounter++ == 0 ) {
m_state = PyGILState_Ensure();
}
}
~ScopedGILLock()
{
if ( m_referenceCounter == 0 ) {
return;
}
if ( --m_referenceCounter == 0 ) {
PyGILState_Release( m_state );
m_state = {};
}
}
private:
inline static thread_local PyGILState_STATE m_state{};
inline static thread_local size_t m_referenceCounter{ 0 };
};
The problem now is that it locks when called from a child thread because the main thread has the GIL from the beginning because it was called from Python. At this point, I honestly question how useful that PYTHONDEVMODE
error is. Why do I have to lock the GIL on the other thread when I already own it on the main thread and can ensure, thanks to SharedFileReader
, which wraps PythonFileReader
, that all calls to Python are serialized? Well, I see the usefulness of PYTHONDEVMODE=1
and would like to enable it on my CI tests.
The question is now how to best release the GIL on the main thread. I don't want to do that inside each method that is exposed to Python because it feels like a deadlock bug waiting to happen, e.g., when adding another method in the future.
Edit: Well, I guess only methods that wait for worker threads to do something would potentially be required to release the GIL. And thanks to the relatively slim interface, this should only affect the read
method.
It appears that the background threads do not ensure they hold the GIL when they call Python APIs. When the interpreter is running in dev mode (
PYTHONDEVMODE=1
or-X dev
) the interpreter detects this and kills the process; in non-dev mode, the results are undefined.Tested with Python 3.10, 3.11 and rapidgzip 0.10.3 on macOS 13.6.1/Apple Silicon (I was unable to produce this same result on Python 3.12 because it segfaults; that may be a symptom of the same problem, however.)
Given this example script:
You should be able to crash (or not) the process like so:
This produces a core with the following stack traces:
Using a more complicated underlying stream (from the
smart-open
package) I've also seen a crash like this:Finally, here's what the Python 3.12 segfault looks like for that same test program: