Closed pdjstone closed 6 months ago
Thanks for reporting and the reproducer.
SharedFileReader
and then calls the underlying Python file, which tries to acquire the GILSharedFileReader::seek
, which tries to acquire the SharedFileReader
lock, which the other thread already has.This is a common situation and C++ would have support for locking multiple mutexes, but the GIL doesn't fit this scheme and, furthermore, because of the abstraction layers, no code point knows that two locks have to be locked at once.
I don't see why this should have changed with 0.11.0 from 0.10.4, but it doesn't matter. I'm so close to simply reverting all those pains that the "fix" for #24 introduced...
I have tried to fix that deadlock by requiring to have the GIL before trying to lock SharedFileReader but now it deadlocks in some other way:
_Py_read
, which deadlocks. The GIL that thread 2 has at this point?The problem seems that Thread 3 at some point inside PyObject_Call
releases the GIL and Thread 2 snags it right up. I accidentally found a quote that might explain why it happens:
In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see sys.setswitchinterval()). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime.
This is a nightmare.
The reversed order also does not work in case the GIL has already been acquired outside, which happens when the main Python thread calls a relevant function. To avoid that, we need to unlock the GIL first in order to force the correct lock ordering.
With all that and lots of removed comments, the atrocity to avoid the deadlock looks like this:
struct FileLock
{
explicit
FileLock( std::mutex& mutex ) :
m_fileLock( mutex )
{}
private:
#ifdef WITH_PYTHON_SUPPORT
const ScopedGILUnlock m_globalInterpreterUnlock;
#endif
const std::unique_lock<std::mutex> m_fileLock;
#ifdef WITH_PYTHON_SUPPORT
const ScopedGILLock m_globalInterpreterLock;
#endif
};
I have pushed the fix to https://github.com/mxmlnkn/indexed_bzip2/tree/zlib-support
Can I copy-paste your reproducer into the CI Python tests (src/tests/testPythonWrappers.py
) or do you want to open a PR for proper author attributions?
Happy for you to just copy-paste the code. I don't envy you tracking down and fixing these hairy Python threading/GIL edge cases. Thanks for the project though, it's been fantastically useful for me.
Thanks, the fix works for me with both my minimised test case and my more complex original code
Thank you for testing. It will be released as 0.11.1 shortly.
After the fix for issue #26, I get a reproducible hang when rapidly seeking forwards and backwards in the gzip file. This doesn't happen with the previous release.
The following code reliably reproduces the issue for me: